Scene Storm

home *** CD-ROM | disk | FTP | other *** search

/ Scene Storm / Scene Storm - Volume 1.iso / coding / asm_reference / amigaocs_hardware.doc < prev next >

Wrap

Text File | 1995-09-01 | 208KB | 4,792 lines

This is my hardware DOC file. It bears no relation to the stuff in any of the Amiga manuals to my knowledge, but draws from the Amiga System Pro- grammer's Guide by Abacus for much important material. Also, I include some example pieces of code of my own, routines plus data structures, illustra- ting how to manipulate the hardware directly. List of Topics: Hardware:Preamble Hardware:The CIA Chips Hardware:DMA Hardware:Interrupts Hardware:The Copper Hardware:Bitplanes & Bitplane Control Hardware:Sprite Management Hardware:The Blitter Hardware:Sound Management Hardware:Disc Management Hardware:Interfaces Hardware:Mouse, Keyboard, Joysticks Hardware:Some Notes Hardware:Logic Tutorial (for mathematically skilled readers & masochists) Hardware:Preamble This DOC file is somewhat oddly organised. This is because several different sections contain material that cross-references across topic headings. Where a cross-reference exists, I'll signpost it using the # character followed by a number. If you use Devpac to view this file, you can skip across the cross-references using the search facility. The main base addresses for the chips are $BFD000 and $BFE001 for the CIAs, and $DFF000 for the gee-whizz custom chips. The custom chips are organised as a linear block of word addresses from the base address onwards, and the CIA register addresses occur at 256-byte intervals from the base add- ress onwards. Other than that, there will occur extra pieces of information relating the way that several of the chips are connected to various hardware interfaces such as the keyboard and mouse. They are the main reason for the implementation of the cross-referencing scheme in this file, because in the case of the CIAs, they are connected to several different actual hardware interfaces. I find it best to use instructions of the form MOVE.W #DATA,register(A0) when accessing the chips, because there's less time overhead than with those instructions using absolute addresses. In the above case, A0 contains the base address of the chip set being accessed. One more point. The 68000 CLR instruction issues a read access and then a write access singal across its bus. Other 680x0 chips from the 68020 onwwards perform a single write access only. Because of this, using something like CLR COPJMP1(A0) is not recommended, even though it's shorter than using MOVE.W #0,COPJMP1(A0), when accessing strobe registers (see later for an ex- planation of what these are). Those blessed with assemblers capable of gen- erating 68020/68030/68040 code and inline coprocessor code for the 68881 and 68882 maths coprocessors should NOT use these facilities when accessing the custom chips in particular if they're lucky enough to have a 680x0 acceler- ator board fitted. Save these facilities for accelerator-board specific code only. Also be warned about invalid cache entry on these processors if the DMA from the custom chips scribbles all over memory when the accelerator board processor is attempting an access at that address range! Given the propensity of games writers in particular to tell AmigaDos to take a hike, I issue the following caveats:using MOVE to SR to kill off iterrupts doesn't work properly-the interrupts are handled by the 4703 custom chip and generated anyway. All that happens is that the 68000 ignores them, and the relevant bits in the 4703 control registers don't get handled proper- ly. This can cause problems. Also, supervisor mode doesn't provide any real advantages over user mode (although I'm guilty of abusing it sometimes). If you're handling your own exceptions directly, take account of differences in exception stack frames especially if you've got a 680x0 accelerator board! I recently found out that a gee-whizz 68040 board is now available with 2 megs of memory from a company in Pasadena. £2000 and it's yours. Also, if you are the sort of programmer who pees around with the ROM, watch out! One final point. To save typing out addresses of the form $DFFxxx in full for custom chip register addresses, I'll refer to them via offsets from the base address of $DFF000. It's a VERY GOOD idea to do this in programs as well! Load, say, A5 with $DFF000 and access the register of your choice using something like MOVE.W #DATA,REGISTER_NAME(A5) instead of littering code with absolute addresses (you'll also save 4 cycles of processor time per access!). Any persons having encountered the dread set of files called 'my_hardware.i' etc., which I have created for my own use & recently distributed to others will find all of the custom chip registers in name form defined as offset equates from the base address of $DFF000, and any programs using these include files accessing the custom chips in the manner above. Hardware:The CIA Chips There are two of these beasties on the Amiga. They're referred to as CIA-A & CIA-B respectively. CIA-A lives at $BFE001 onwards, CIA-B at $BFD000 on. I now present the register list. The formula for the address of each register is register number*256+CIA base address, and all CIA registers are a single byte wide. Register Name -------- ---- 0 PRA (Port A Data Register) R/W 1 PRB (Port B Data Register) R/W 2 DDRA (Port A Data Direction Register) R/W 3 DDRB (Port B Data Direction Register) R/W 4 TALO (Timer A lower 8 bits) R/W 5 TAHI (Timer A upper 8 bits) R/W 6 TBLO (Timer B lower 8 bits) R/W 7 TBHI (Timer B upper 8 bits) R/W 8 E.LSB (Event counter lower byte) R/W 9 E.MID (Event counter middle byte) R/W A E.MSB (Event counter high byte) R/W B not used C SP (Serial port data register) R/W D ICR (Interrupt control register) R/W E CRA (Control register A) R/W F CRB (Control register B) R/W Note:the 'serial port' on the CIAs is NOT the same as the Amiga serial port at the back! This is an internal serial port, and on the CIA-A it is connect- ed to the 6500/1 processor handling the keyboard (#1). The main serial port for external communications is handled via Paula (#2). Data registers:the CIA data registers are connected in a seemingly odd manner to various different pieces of hardware. They can be treated bitwise as input or output registers. If the corresponding DDRx bit is 0, the same bit on the PRx is an input, and an output if the DDRx bit is a 1. Writing to a data reg- ister stores the value in it, and reading it reads the state of those port lines to which it is connected. There is a simple handshaking mechanism via two lines, called PC and FLAG. PC goes low for 1 clock cycle on each access to PRB. The FLAG input re- sponds to such transitions. Every time the state of FLAG changes from 1 to 0, the FLAG bit in the ICR is set. These two lines allow a simple form of hand- shaking in which the PC and FLAG lines of the CIAs are cross-connected. The sender need only write its data to the port register & then wait for a FLAG signal before sending each additional byte. Since FLAG can generate an int- errupt if wanted, the sender can perform other tasks while waiting for the FLAG signal, and continue the sending via an interrupt routine. The same also applies to the receiver, except that it reads the data from the port instead of writing to it. CIA-A has its PRB connected to the parallel port (more cor- rectly, the Centronics port data lines), and CIA-B has a whole series of awk- ward connections for PRB as follows (Cross-reference #6): Bit Connection --- ---------- 7 /MTR motor signal to disc drive 6 /SEL3 drive 3 select 5 /SEL2 drive 2 select 4 /SEL1 drive 1 select 3 /SEL0 drive 0 select 2 /SIDE drive side select 1 DIR data direction signal to disc drive 0 /STEP step signal to drive head stepper motor Both CIAs have awkward connections for PRA. They are (CIA-A first, then CIA-B): Bit Connection --- ---------- 7 Game port 1 pin 6 (fire button) 6 Game port 0 pin 6 (fire button) 5 /RDY disc ready signal from disc drive 4 /TK0 disc track 00 signal from disc drive 3 /WRPO write protect signal from disc drive 2 /CHNG disc change signal from disc drive 1 LED power LED (0=on, 1=off) 0 OVL memory overlay bit (DANGER!!!!) 7 /DTR DTR signal from serial interface 6 /RTS RTS signal from serial interface 5 /CD CD signal from serial interface 4 /CTS CTS signal from serial interface 3 /DSR DSR signal from serial interface 2 SEL select signal for Centronics interface 1 POUT paper out signal from Centronics interface 0 BUSY busy signal from Centronics interface When I said the connections were awkward, I wasn't kidding. Control registers:the two control registers determine the mode of operation of many of the other registers. CRA bit allocations are: Bit Function --- -------- 7 Not used 6 SPMOD serial port register mode 0=input, 1=output 5 INMODE 0=clock, 1=CNT (see later) 4 LOAD 1=force load (strobe) for timer A 3 RUNMODE 0=continuous, 1=one-shot for timer A 2 OUTMODE 0=pulse, 1=toggle 1 PBON 0=PB6 off, 1=PB6 on 0 START 0=off, 1=on CRB bit allocations are: Bit Function --- -------- 7 0=TOD (time of day-see later), 1=ALARM 6,5 INMODE 00 = clock, 01=CNT 10 = timer A input, 11 = timer A + CNT 4 LOAD 1=force load (strobe) for timer A 3 RUNMODE 0=continuous, 1=one-shot for timer A 2 OUTMODE 0=pulse, 1=toggle 1 PBON 0=PB7 off, 1=PB7 on 0 START 0=off, 1=on Timers:this is one of the complicated things. These timers can count down from a preset value to zero. When reading, they are treated as TALO/TAHI and TBLO/TBHI. When writing, they are treated as latches PALO/PAHI and PBLO/PBHI. The latches should be loaded, low byte first, as a write access to the high register causes the timer to be stopped and reloaded with the latch value unless the LOAD bit in the control register is set, in which case the latch value is transferred to the timers regardless of the timer state. The LOAD bit in the CRA/CRB registers is a strobe bit. Writing simply causes the given action to be performed-the bit value is not stored as such. When the timer hits zero, the latch values are automatically put back into the timer. The latch is also known as the prescaler. The number of timeouts in continuous mode is equal to the clock frequency divided by the value in the prescaler/latch. Timer A is connected to the processor E clock, the frequency being 716KHz. Reading the timer can cause problems, because it has to be performed in two separate read operations. One problem is that the timer could change its state between the two reads. For example, if the timer bytes combined are $0100 in value, and the high byte is read, the timer could decrement to $00FF before the low byte is read, resulting in a final read value of $01FF, which is incorrect (it should be $00FF). Stopping the timer from decrementing during the read is one method of preventing this, but this is not the most elegant way. Instead, read the high byte, then the low byte, then read the high byte again in a separate register. If the two high byte reads are the same, take this as the value. If not, repeat the process. It should only need one repeat read to obtain the correct result, but in any case the following code should obtain the correct value (code is for CIA-A): LEA $BFE001,A0 ;CIA-A base address get_timer MOVE.B $400(A0),D0 ;get TALO MOVE.B $500(A0),D1 ;get TAHI MOVE.B $400(A0),D2 ;get TALO again CMP.B D0,D2 ;timer values match? BNE.S get_timer ;repeat read if not NOP ;here, got correct timer! Bits 5 and 6 of the control registers control the timer functions. For timer A, there are two possibilities. Either CRA INMODE=0, in which case timer A is decremented during each clock cycle on the processor's E clock line, or else CRA INMODE=1, in which case each high pulse on the CNT input line decrements the timer. Timer B has four possible operating modes, governed by the INMODE bits of CRB. First there is INMODE mode 00, in which timer B decrements each time a pulse occurs on the E clock line (connected to the f2 pin of the CIA) just as for timer A INMODE=0. INMODE=01 decrements timer B every time there is a pulse on the CNT line. INMODE=10 allows timers A and B to be combined to form a 32-bit timer, and timer B decrements every time a timeout signal from timer A is received. In this mode, timer A forms the low word of the 32-bit timer, and timer B the high word. Finally, INMODE=11 allows the length of a pulse on the CNT line to be measured. In this mode, timer A issues a timeout when the CNT line is high. The timeout signals are in the interrupt control register or ICR, to be dealt with later. Two output modes for the timers can be selected with the OUTMODE bit of the control registers. OUTMODE=0 causes timeout signals to appear as a positive pulse one clock period long on the corresponding port line. When the OUTMODE=1 (toggle mode), each timeout causes the corresponding port line to change value from low to high or high to low. Each time the timer is started in this mode, the output starts as a high signal. The RUNMODE bit determines whether the timer operates in one-shot mode (RUNMODE=1) or operates continuously (RUNMODE=0). In one-shot mode, the timer stops after each timeout and sets the START bit to 0. In the contin- uous mode the timer restarts after each timeout automatically. Timer A of CIA-A is used by the operating system for communication with the keyboard, and timer B by the operating system for some other tasks. Timer A of CIA-B is used for serial data transfers, otherwise it is free, and timer B is used to synchronise the blitter with the screen by the OS, other- wise it is free. Interrupt Control:this is actually implemented as two registers, one being a read register and one being a write register. The bit allocations are given in the table below, read register first. The read register is the ICR data register, the write register the ICR enable mask register. Bit Function --- -------- 7 IR - Interrupt received/signalled 6 not used, always 0 5 not used, always 0 4 FLAG - PRB port handshaking 3 SP - serial port needs attention 2 Alarm signal 1 TB - Timeout for Timer B 0 TA - Timeout for Timer A 7 Set/Clr bit (explained below) 6 not used, send a 0 5 not used, send a 0 4 FLAG input enable 3 SP - serial port interrupt enable 2 Alarm input enable 1 Timer B timeout enable 0 Timer A timeout enable The set/clr bit is used as follows. Let us assume that the byte to be written to the ICR is %00010011. Bit 7 (the set/clr bit) is 0. This tells the input latch for the ICR to clear all bits in the ICR write register corresponding to the 1 bits in the byte just written to it, and to leave all other bits un- changed. This particular write byte has bits 4,1 and 0 set, so the FLAG, TB and TA interrupt inputs are disabled. If the byte was %10010011 instead, the set bit 7 (corresponding to set/clr on the ICR) would tell the input latch for the ICR to set all bits corresponding to the other 1 bits in the byte just written, and leave all of the others unchanged. This write byte would enable the modes disabled by the previous write byte example. Note that this set/clr mechanism occurs throughout the Amiga chip set, in particular for DMA control (#3), main system interrupt control (#4) and other functions. Having written a control byte to enable various interrupt sources (CIA-local interrupt sources, NOT main system ones!), we can now read the same address and the chip will return the value of the ICR data register, which contains bitwise information about which interrupt source triggered the CIA's internal interrupt mechanism. Note that any interrupt handled via the CIAs is further passed onto the system as a whole, to allow 68000 auto- vectored interrupt code to handle the interrupt. More about this later. If this value is needed for multiple interrupt source tests, then it must be saved, because reading the ICR data register causes the CIA to clear it after the read. The IR bit is the interrupt request bit, and if set, indi- cates that a valud CIA internal interrupt was triggered. But this and all of the other ICR data bits are cleared upon reading, so if you fail to save the result of the read (this code saves the read value); LEA $BFE001,A0 ;CIA-A base addr MOVE.B $D00(A0),D0 ;get ICR data reg value MOVE.B D0,last_icra_read(A6) ;save in my variable block! and then obliterate the unprocessed bits without saving as above, then you have lost forever those unprocessed bits. You have been warned! The CIA signals an interrupt as follows. Whenever one of the various interrupt sources sets its corresponding bit in the ICR data register, the CIA checks to see if the corresponding ICR mask enable bit is also set. If this is so, the CIA pulls the IRQ line low to signal the interrupt in hard- ware, and then sets the IR bit (bit 7) in the ICR data register to signal the interrupt in software also. How the main system treats this signal is handled later. The IRQ line does not return to the high state (its normal state) un- til the ICR data register is read (and hence the IR bit, and all others, are cleared by the CIA). Event Counter:this event counter differs from the 6526 Time-Of-Day counter found, for example, in the Commodore 64 (yeuk!). This is almost the only difference between the 6526 and the 8520 CIA. Those unfortunate enough to have encountered the 6526 in past programming will breathe a sigh of relief. There is one slight problem with the documentation that Commodore supply- they insist on referring to TOD (time of day) in the literature, even though this term is meaningless here. Instead of the real-time clock which counts hours, minutes and sec- onds on the 6526, the 8520 has a simple 24-bit event counter. It takes its input signal on the TOD line (jeez, Commodore!) of the chip. The event coun- ter starts at zero (or some other predefined state written to it) and counts UPWARDS to $FFFFFF, before returning to zero. The event counter consists of the counter proper and a latch register. When the high byte is read, the ac- tual counter state is transferred completely to the 24-bit latch, and the high byte of the latch returned. The counter continues counting undisturbed while the remaining latch registers are read, mid-byte next, then low-byte. ALWAYS read in the order high, mid, low or else this won't work! Writing a value to the event counter causes the CIA to stop the counter until the entire value is written, PROVIDED that it is written to in the same order as it is read from above, namely high, mid, low. Once the low byte is written, the timer starts up again, with the written value as the start value. When writing the event counter value in normal mode, be sure that the TOD/ALARM bit of the CRB control register is cleared! An alarm function exists also. SET the Alarm bit in CRB as opposed to clearing it as for normal event counter writing, and write a value into the event counter. The chip will take this value as an alarm value, and if the event counter value ever matches this alarm value, the alarm bit of the interrupt control register is set. The value of the alarm setting CANNOT be read, only written - reading the event counter always returns the current counter latch value, regardless of the state of the TOD/Alarm bit in CRB. So if you want to provide known alarm values, save them somewhere safe before using them! Serial Port:the serial port consists of the serial data register (which is readable and writeable) and the shift register (not directly accessible). Setting SPMODE=0 in the control register sets the serial port to input mode and SPMODE=1 to output mode. In input mode, the serial data on the SP line are shifted into the shift register after each rising edge on the CNT line. After 8 CNT pulses the shift register is full, and the data is transferred to the serial data reg- ister. At the same time, the SP bit in the ICR data register is set. If more CNT pulses are received, the data continues to shift into the shift register until it is full again. If the user has read the serial data register before this, the data is transferred across again, else it is lost. When using this register for input, respond to it reasonably promptly! In output mode, timer A is used to determine the send frequency. The timeout rate of timer A (which must be operated in CONTINUOUS mode) controls the baud rate of the transfer. The data are shifted out of the shift register at half the timeout rate of timer A, whereby the maximum output rate is 1/4 of the clock frequency of the 8520. The transfer begins after the first byte is written to the serial data register. The CIA transfers the byte into the shift register. The individual data bits now appear at half the timeout rate of timer A on the SP line and the clock signal from timer A appears on the CNT line (it changes value on each timeout so that the next bit appears on the SP line on each negative transition [high to low] ). The transfer begins with the most sig- nificant bit of the data byte. Once all 8 bits have been output, the CNT line remains high and the SP line retains the value of the last bit sent. In add- ition, the SP bit of the ICR data register is set to show that the serial data register can be supplied with new data. If the next data byte was loaded into the data register before the output of the last bit, the data output will continue without interruption. To keep the transfer continuous, the serial data register must be re-supplied with fresh data at the proper time. The SP and CNT lines are described as 'open-collector' outputs. This allows the outputs of multiple CIAs to be connected together. See any good text on electronic interfacing techniques for an explanation (especially the mass of material on the IEEE-488 bus) of open-collector outputs. On the Amiga, the SP line of CIA-A is connected to the KDAT line of the keyboard 6500/1 processor, and the CNT line connected to the KCLK line of the keyboard processor. So hardware keyboard reads can be performed by enab- ling serial port interrupts in CIA-A (which AmigaDos does for its own use), and writing an autovector interrupt routine to intercept the ICR SP signal and read the serial data register. The SP of CIA-B is connected to the Centronics BUSY signal (as is PA0 of the CIA parallel port A) and the CNT line to the Centronics PAPER OUT signal (as is PA1 of the CIA parallel port A). Other Information:other connections for CIA-A are: PC /DRDY - Centronics handshake, data ready FLAG /ACK - Centronics handshake, data acknowledge IRQ /INT2 input from Paula (#2) RES /RES reset line and connections for CIA-B are: PC not used FLAG /INDEX - index signal from disc drive IRQ /INT6 input from Paula (#2) RES /RES reset line One final point. For those with access to FAST RAM, it is possible to examine the Kickstart ROM on the Amiga 500. Simply clear the Memory Overlay Bit (OVL, bit 0, CIAAPRA) from a program in FAST RAM, and the CHIP RAM memory address range becomes mapped onto the underlying Kickstart ROM. This can be read at will. DO NOT TRY THIS FROM A PROGRAM WITHIN CHIP RAM - THE PROGRAM WILL BE LEFT HANGING AS THE 68000 PC REFERENCES THAT PART OF THE KICKSTART ROM WHICH HAS THE SAME ADDRESS AS THE NEXT INSTRUCTION IN YOUR OVL SWITCHING CODE! I HEAR THE GURU KNOCKING ALREADY... Hardware:DMA DMA (Direct Memory Access) is the technique used by most of the Amiga custom chips to perform their functions. The system is organised quite neatly, in that there are two 'halves' to the processor busses. One half is connected to all of those components accessible solely to the 68000 (such as FAST RAM if it is present) and the CIAs. The other half contains the CHIP RAM and the custom chips. The two halves are separated by a buffer, which disconnects the custom chip half from the processor whenever the processor makes an acc- ess to the FAST RAM or the CIAs. The custom chips then have total access to the CHIP RAM. If the processor accesses the CHIP RAM or the custom chip registers, then the buffer re-establishes the connection. In this case, there exists a risk of bus contention, where two bus controllers try to take over the busses simultaneously with obviously disastrous results. Bus accesses are nested, so that this problem is largely avoided. Also, the processor can wait until the bus is free if the blitter has absolute priority (this can be set under the control of software). Also, there exist odd and even bus cycles, and DMA acc- ess is restricted to the odd bus cycles, the even bus cycles being granted under normal circumstances to the processor. Right, we've got that out of the way. Now for the details. The DMA system on the Amiga consists of DMA channels, each channel assigned to one of the hardware functions. The full list of DMA channels is: Bitplane DMA (6 channels): These channels are used by that part of the hardware which takes the bit plane data & converts it for output to the screen. If you select fewer bit planes, then some bitplane DMA chan- nels remain unused. Sprite DMA (8 channels) : These channels are used by the sprite processor. Once a given sprite DMA channel has been used for generating a sprite, it can be re-used for another (with some restrictions). See later. Disc DMA (1 channel) : Data transfer from disc to RAM or vice versa. Audio DMA (4 channels) : These channels are used for processing of audio data in RAM & passing them to the sound chip. Incidentally, for any one interested in music, conversion of Amiga sound data to Fairlight synthe- sizer data and vice versa is possible! Paula's sound system possesses a limi- ted Fairlight compatibility. Copper DMA (1 channel) : This channel is used for data transfer of command words to the Copper. If the Copperlist tells the Copper to change the values of registers bound to other DMA channels, those channels are used for those purposes selected within the Copperlist-the Copper DMA channel is solely for Copperlist transfer to the Copper. Blitter DMA (4 channels): These channels are used for data trans- fer to and from the blitter. Now, to make life interesting in some ways (and simpler in others) the Amiga designers related the DMA channel priorities and usages to the construction of the video picture, and the timings in bus cycles are related to the video picture mainly in order to make construction of the fabulous Amiga graphics displays easier. What is a bus cycle? Simply put, a time span of 280 nanoseconds. This is the time taken for a single memory access across the bus by a device using a DMA channel. Sad to say, the 68000 cannot match this speed and needs 560 nanoseconds per memory access (two DMA bus cycles). The system is designed so that during these two bus cycles, access to the bus is split between the DMA channels and the 68000 as mentioned above, into odd and even cycles. Note: I haven't distinguished between read and write memory accesses for one simple reason, namely that both take the same time within this system. Now, if we number the cycles from zero upwards, cycle zero is given over initially to the processor. If the processor wants access to the bus, it gets it initially during cycle zero. Once cycle zero has finished, cycle one begins, which is reserved (as are all odd cycles) for the DMA controller. The DMA controller gets access to the bus during cycle one. Once cycle one has elapsed, cycle two begins and so on, access to the bus alternating between the 68000 and the DMA controller. This assumes that the 68000 wants to access the bus continuously during these cycles. If the 68000 is performing internal processing upon register data, or accessing true FAST RAM only, then that half of the bus connected to the DMA controller is available to the DMA con- troller during the even cycles as well, and should this be the case, the buffer isolates the CHIP RAM section from the 68000, and the DMA controller can access CHIP RAM during even bus cycles as well. So, in an ideal world, with a 10 meg FAST RAM expansion, your Amiga runs like the proverbial bat out of hell. Should the 68000 want to access CHIP RAM, however, then the fun be- gins, because the DMA controller is stuck with the odd cycles, except under certain conditions. Generally, the audio, disc and sprite DMA only use up the odd bus cycles. Thus audio, disc and sprite accesses do not slow up the processor. If large amounts of bitplane activity is required, or the blitter is activated, then some of the processor's even cycles are 'stolen' for these purposes. So when this happens, the 68000 runs more slowly. Ok, I said that bus cycles are related to the video picture. Well, they are. Now I'll explain how. A raster line on the screen takes 63.5 micro- seconds to produce. This is equal to 227.5 bus cycles per raster line. Each of the bus cycles that occur during this time are allocated for some purpose or other. The odd cycles are generally reserved for disc, audio and sprite DMA first, then bitplane DMA. The Copper and the blitter both use even bus cycles (tut, tut, that was naughty, Commodore!) and thus chew into the time available for the processor. Anyone possessing a copy of the Amiga System Programmer's Guide will no doubt have seen the little chart of DMA bus cycles for one raster line. If so, it will be noticed that the little boxes representing DMA bus cycles do not add up according to the values given along the top of each chart section. So my attempt to list in full how the DMA bus cycles are allocated here has run into trouble. Should anyone come up with a neat system of explaining this that doesn't require inclusion of an IFF picture with the DOC file as an aid please get in touch. Now for the start of the really useful information. The DMA control register, called DMACON, consists of two parts. There is DMACON (at offset $96) and DMACONR (at offset $02). DMACON is a write-only register, and DMACONR is a read-only register. Generally, if the register name ends in the letter 'R', it's a read-only register (nice sensible convention this). Both registers are word sized, and each bit is allocated as follows (#3,#4): Bit Function --- -------- 15 SETIT (set or clear bits) 14 BBUSY (blitter busy-read only) 13 BZERO (blitter zero-read only) 12 Not used 11 Not used 10 BLTPRI (Blitter has absolute priority over the 68000) 9 DMAEN (Master DMA enable) 8 BPLEN (Bitplane DMA enable) 7 COPEN (Copper DMA enable) 6 BLTEN (Blitter DMA enable) 5 SPREN (Sprite DMA enable) 4 DSKEN (Disc DMA enable) 3 AUD3EN (Enable Audio channel 3) 2 AUD2EN (Enable Audio channel 2) 1 AUD1EN (Enable Audio channel 1) 0 AUD0EN (Enable Audio channel 0) For bits 10 down to 0, if the bit is set, the corresponding function is enabled, else it is disabled. So to enable the blitter DMA, set bit 6 of the DMACON register. The DMACON register takes command words formed in a slightly odd way if you don't understand the rationale behind it. Basically, to avoid the need to read the current status, make a mask, exclusive-OR in your choice of bits to change and write back the result, in order to ensure that ONLY your choice of bits is changed, bit 15 is used to decide whether you want to set or clear the appropriate bits (hence the name, SETIT), and the other 14 bits are used to signal if each of the given bits is the bit of your choice (set if this is so, clear if not). This has the welcome effect of ensuring that DMA channels not under consideration are left alone during your write to DMACON. Example:to enable the bitplane DMA, bit 8 must be set. The command word thus has bit 8 set (select the bitplane DMA bit), bit 15 SET (to signal that bit 8 is to be set) and all other bits zero (leave the other DMA chan- nels unaffected). If I wished to disable the bitplane DMA, the command word I would write would have bit 8 set, bit 15 CLEAR, all others clear. Cross-reference #3 : the CIA chip documentation above refers to the set/clear mechanism in connection with its control registers. The above para- graphs give a complete explanation of this mechanism for anyone who has used the cross-referencing scheme to find out more. This mechanism also is used by the INTENA register (#4) and ADKCON (#5). As an extra safety feature, DMA channels only become truly enabled when the master DMA enable bit (bit 9) is set. So it is possible to select DMA channels one at a time, then suddenly enable the lot in one go by enab- ling the master DMA enable bit. Clearing the master DMA enable bit disables ALL DMA channels. Of course, multiple channels can be enabled, e.g., MOVE.W #$8380,DMACON(A5) enables the master DMA control, the bitplane DMA, and the Copper DMA in one go (see the Preamble for notes on my choice of addressing mode and why). Note that the current DMA enable status can be obtained via the instruction MOVE.W DMACONR(A5),D0 or something similar. A set bit indicates a DMA channel enabled. Incidentally, should you want to kill off AmigaDos and Exec totally (as games writers like to) then kill off all DMA using the instruction MOVE.W #$7FFF,DMACON(A5) in your code. This alone won't do it-you'll need to kill off the interrupts as well-but it goes a long way toward doing precisely that. The full method is : 1) Kill off all DMA as above; 2) Kill off all interrupts (see below for how to do it); 3) Point all 68000 interrupt vectors to your own custom interrupt handlers, or RTE if you don't want to handle a given IPLx int- errupt level; 4) Get into supervisor mode & change the 68000 IPLx level to make the 68000 acknowledge only those interrupts you want it to (the 4703 actually generates them-see below for how to make the 4703 generate only those interrupts that you want) using MOVE SR (VERY NAUGHTY! SPANKING TIME FOR ALL YOU FETISHISTS OUT THERE!); 5) Set up your own DMA system but DON'T ENABLE YET; 6) Enable 4703 interrupts (again see below); 7) NOW ENABLE YOUR DMA! At this point Exec etc., is dead and beyond resurrection other than via a hard reset. If that's what you want, so be it. Hardware:Interrupts Ok, I've already mentioned a little about interrupts above in the DMA section but not the whole story. Now is the time to correct the omissions. I've already mentioned the existence of the 4703 interrupt control- ler, which actually generates the interrupts. All that the 68000 does within the Amiga is respond to these externally generated interrupts if the value of the IPLx bits allows it to. The standard value of SR on the Amiga is $0100 in user mode, $2100 in supervisor mode, and all interrupts from level 1 onwards are responded to-there's a hell of a lot of interrupts on this computer! The 4703 interrupt controller is programmed via two sets of custom chip registers. Again, each set has a read-only and a write-only component. These registers are: INTREQ (offset $09C) : write only INTREQR (offset $01E) : read only INTENA (offset $09A) : write only INTENAR (offset $01C) : read only INTENA is the interrupt enable register, and INTREQ is the interrupt request register. Again, the read-only ones end in 'R'. The structure of all four of these registers is identical-they are all split into individual bits, which are assigned as follows (the numbers in square brackets in the right-hand column correspoond to the 68000 IPLx priority of the said source-note that the NMI interrupt of priority 7 is never used): Bit Function --- -------- 15 SETIT (just like the DMACON register above) 14 INTEN (master interrupt enable) [6] * 13 EXTER (int. from CIA-B or expansion port) [6] 12 DSKSYN (disc sync value recognised) [5] 11 RBF (serial receive buffer full) [5] 10 AUD3 (output audio data channel 3) [4] 9 AUD2 (output audio data channel 2) [4] 8 AUD1 (output audio data channel 1) [4] 7 AUD0 (output audio data channel 0) [4] 6 BLIT (blitter ready) [3] 5 VERTB (vertical blank interrupt) [3] 4 COPER (Copper interrupt) [3] * 3 PORTS (CIA-A or exapnsion port) [2] 2 SOFT (reserved for software interrupts) [2] * 1 DSKBLK (Disc DMA transfer done) [1] 0 TBE (serial transmit buffer empty) [1] Once again, if the corresponding bit from bit 13 down is set in the INTENA register, that interrupt source is enabled. If one of those bits is set by an interrupt source in the INTREQ register, the corresponding interrupt is generated & sent to the 68000. Setting or clearing the bits in INTENA/INTREQ is done in an indentical fashion to the DMACON register above. Again, the state of the master interrupt enable bit (bit 14) determines whether the 4703 can generate any interrupts at all. If bit 14 is clear, NO interrupts will be generated by the 4703. Needless to say, having decided which interrupts to enable, one can enable them singly or in one go by choosing the appropriate value to write to INTENA. So, if the INTEN bit is set, and the given bit for the interrupt source is set, that interrupt CAN be generated, even if there is no guarantee that it WILL be generated. Ok, we can decide which interrupts to respond to. How does the 4703 generate them? Simple. Any interrupt source setting a bit in the INTREQ reg- ister causes the 4703 to generate the appropriate interrupt, at which point the 68000 gets to know about it. This can either happen in hardware e.g., the blitter finishes its job & posts its interrupt signal) or can be performed in software, e.g., by setting the SOFT bit in your code directly-this will gene- rate a software interrupt signal as used by Exec for its softints, but unless Exec is alive and well, DON'T expect Exec to handle it like a softint! There is a special case, the Copper interrupt. The Copper can be made to set its own reserved Copper interrupt bit directly within a Copperlist (see later), as a means of forcing a Copper interrupt other than the vertical blank which is handled in hardware, and so is totally under your control. Those bits of INTREQ capable of being set in software by your programs at will are marked in the above list with a '*'. The others CAN be set, but care must be exer- cised if you are to do this in your code (Usually done if you are writing a set of interrupt handlers to handle the functions on a continuous basis, and HAVE to set the appropriate INTREQ bit to start the sequence off). So, the 68000 hears about an interrupt if: 1) INTEN (master interrupt enable) is set in INTENA; 2) The corresponding interrupt source bit is set in INTENA; 3) The corresponding interrupt source bit is also set in INTREQ. All three conditions need to be fulfilled, else the 68000 will think that no interrupts are being generated by the interrupt source concerned. Further in- formation is available in the file 'typed_interrupts.doc' in this series. Cross-reference #4:for a full explanation of the SETIT bit, see the cross-reference #3 above. What do we do now? Well, you need an interrupt handler terminated by an RTE & the appropriate 68000 interrupt vector changed to point to it. If that interrupt is enabled, and its appropriate bit in INTREQ is set, then the handler will be called. Your handler should do the following: 1) Read INTREQR to find out which interrupt request was made. Some of the 68000 interrupt handlers will have to handle more than one interrupt source, and need to know which caused the inter- rupt exception. 2) If the bit corresponding to the interrupt source that you are interested in is set in INTERQR , then CLEAR it in INTREQ to signal that you've acknowledged it. The 4703 can now handle another interrupt of the same type. Incidentally, things will get interesting if the interrupt source posts its interrupts faster than you can respond to them! 3) Process your interrupt as you wish from this point on. This, of course, is over and above the usual things that interrupt handlers are supposed to do, such as save scratch registers on the stack & other junk that you should already be aware of. And for crying out loud, please use the MOVEM instruction to do it! Note that unlike the CIAs above, the INTREQ bits are NOT cleared when INTREQR is read! You have to clear them the hard way! I also would like to point out that setting the SETIT bit of INTREQ will cause a level 6 autovector interrupt to occur, so if you want, you can use this to create your own level 6 soft interrupts-nothing else (other than the Copper should you want it to) will set this bit but your code. Gee, isn't that nice? A word of warning:on my machine at least (this may vary), something quaint happens if one tries performing the instruction CLR.W INTREQ(A5) or similar, due to the read/write access of the 68000 CLR instruction (see the Preamble). The machine seems to 'hiccup' before carrying on as normal. I would not suggest doing this too often, it might blow something expensive to mend. In any case, the above instruction does bugger all to the status of the interrupt system for reasons made obvious upon analysis (see #3 above), so I don't think that there's much point in doing it-I only did it by accident. I cannot stress too much, however, that the Preamble caveat about CLR should be adhered to (I only found out after extensive use of CLR in my code & had to rip it all apart again to make it work properly. Some unfortunates have some copies of my old code instead of the CLR-free versions & wonder what the hell is going on when they run it...). Hardware:The Copper This is one of the harder chapters to write, and for a very good reason. The Copper is a very powerful little piece of hardware, but with that power comes complexity. The Copper is a coprocessor capable of writing values to the custom chip registers independently of the 68000, and of performing actions based on the position of the video beam. All in all, a highly useful little fellow. As befits what is a processor in its own right, it has its own machine language and it is programs written in this special Copper machine language that are the famous Copperlists of Amiga parlance. At this point, those thinking to themselves 'oh, no, not another machine language to learn!' should be reas- sured by the knowledge that there are only three basic instruction types in Copper machine language. These three basic instruction types are versatile, however, and much can be done with them. Sad to say, Copperlists have to be created by hand as far as I am aware, at least if you want to take advantage of ALL Copperlist features. I have been told that the Argasm assembler can take Copperlists in the form of Copper assembly language & assemble them, but I've yet to check this. And yes, there is a Copper assembly language to make life a little easier when creating Copperlists. So, to the Copper Assembly Language (CAL for short). The three basic CAL instructions are: MOVE : Write an immediate data value into a custom chip register (like the 68000 MOVE #nnnn,xxxx); WAIT : Wait until the electron beam generating the video picture has reached a certain position; SKIP : If the electron beam has reached the specified position, skip the next CAL instruction, else execute sequentially as normal. Doesn't seem much, does it? Well, you can do a hell of a lot with this to hand. I shall deal with the instructions in turn. The MOVE instruction:this instruction allows the Copper to write an immediate value into a custom-chip register. The register is specified in the instruction as an offset from the base address $DFF000 (now see why I prefer using offset(An) in 68000 for addressing custom chip registers. The Copper does it in hardware! Makes for consistency in programming) and the value to write is always a WORD value. The CAL syntax is MOVE #value,register for anyone blessed with a CAL assembler (or Argasm if it does this for you). The actual way that the instruction is coded as machine words in memory is: %0000000rrrrrrrr0,$XXXX Here, rrrrrrrr represents the register address. Since all register offsets from $DFF000 are even, bit 0 of the first word of a MOVE instruction is al- ways zero. The second word contains the 16-bit value to write to the chosen custom-chip register. For example, to write the value corresponding to the colour light green into palette register 3, one would use the CAL syntax MOVE #$03C3,COLOR03 which would become (since COLOR03 is at offset $186) the machine words $0186,$03C3 This seems simple enough. Now, the fun starts. There is a restriction upon the registers that can be written to by the Copper. Under normal circumstan- ces, the Copper cannot write to registers from offsets $000 to $07E (most of these are read-only anyway). There exists a special custom chip register, the COPCON register, consisting of one bit (bit 0). If this bit, which is called the Copper Danger Bit or CDANG bit, is set, then the Copper can access the custom chip registers from offsets $040 to $07E, which just happen to be the blitter control registers. Access to the registers from offsets $000 to $03E is NEVER allowed. COPCON itself is at $02E (write-only) and inaccessible to the Copper itself (the 68000 must write to this register). So, bearing this restriction in mind, the Copper can write to most of the custom chip registers, and if allowed to by the 68000, can write to the blitter control registers and influence the blitter. This alone gives the Copper considerable power within the system. In particular, the Copper can change the DMA enable status, the interrupt enable status, the sprite control registers, the palette, the bitplane control, the sound chip, and to a limit- ed extent, the disc controller! Needless to say, it's only safe to let the Copper do all this when you know how it's all done. See each section in turn for the requisite information. The WAIT instruction:this instruction causes the Copper to do just that, wait. The Copper waits for a specific position to be reached by the electron beam generating the video picture before continuing execution of the remaining instructions in the Copperlist. This is how various tricks are achieved such as changing background colours at specified screen positions to create 'sunset' effects etc. The Copperlist contains a sequence of WAITs interspersed with MOVEs to the background colour palette register, COLOR00 at offset $180. Should the electron beam have already passed the given position, normal sequential execution is resumed. The CAL syntax for WAIT is WAIT (x1,y1) MASK (x2,y2) BFD (Anyone used to the CMOVE/CWAIT macros on Devpac here forget those-they never use the full power of the WAIT instruction. This is MY defined CAL syntax for the WAIT instruction, that tells you everything). In this syntax, x1 and y1 are the beam position to wait for. In this syntax, the MASK and BFD entries are optional and can be omitted, but if they are included have a profound effect. More of this below. If omitted, the WAIT instruction makes the Copper wait until the beam reaches position (x1,y1) and then resume normal sequential execution. If the MASK specifier is included, the fun starts. Instead of using (x1,y1) directly, the comparison of the beam position is made with the values formed by logically ANDing together x1 and x2 for the horizontal position, & ANDing together y1 and y2 for the vertical position. Omitting the MASK speci- fier is to be regarded as having the same effect as having a MASK value of (-1,-1), or all 1's in binary (in which case x1 AND x2 becomes x1, etc.). This opens up many possibilities. For example, in the instruction WAIT (0,$0F) MASK (-1,$0F) the WAIT condition will be fulfilled every 16 lines, i.e., whenever the lower four bits of y1 are all 1. Note the -1 mask value for the horizontal position (i.e., all 1's binary). Since in this example I am not interested in the hor- izontal position at all, I could have had a horizontal mask of 0, but if the horizontal position is important, choose the mask accordingly. The mask bits affect BOTH the specified position in the instruction AND the actual beam position coordinates before the position comparison is performed. The machine words for the WAIT instruction take the form: %vvvvvvvvhhhhhhhh1,%bvvvvvvvhhhhhhh0 In the first word, the vvv bits specify the vertical beam position, and the hhh bits the horizontal beam position. Note that bit 0 is equal to 1. This distinguishes WAIT (and SKIP below) from MOVE. WAIT is distinguished from SKIP by having bit 0 of the second machine word set to zero (for SKIP, this is set to 1). In the second word, the b bit is the Blitter Finish Disable bit, or BFD bit. In my CAL syntax, including BFD in the instruction specification has the meaning 'set the BFD bit in the second word'. This bit is used when the Copper is used to start a blitter operation. The Copper in general must know when the blitter has finished whatever blitter operation was started at some time past, whether started by the Copper or the 68000. If the BFD bit is zero (omitting BFD in the CAL syntax means 'clear the BFD bit') then the Copper will WAIT until the blitter finishes, and THEN check the wait condition. If the BFD bit is one, the blitter status is ignored. If COPCON=0, and the Copper does not affect the blitter in any way, BFD should be set to 1. In the second word, the vvv bits are the vertical mask, and the hhh bits the horizontal mask. If no mask is specified, these are all set to 1. Note that bit 7 of the vertical position cannot be masked. If a vertical mask is specified, then bit 7 of the vertical position is always treated as though the mask bit for that position was 1. Note that since there are 313 lines of display, and the vertical position is only 8 bits wide (and the vertical mask 7 bits wide), that WAITs for vertical positions greater than 255 have to be performed as WAIT (0,255) WAIT (0,y) NOTE : Horizontal positions are specified in steps of 4 low-resolution pixels and NOT in pixel coordinates directly! The SKIP instruction:the only difference between this instruction & the WAIT instruction in terms of machine words is that bit 0 of the second word is 1 instead of 0. The bits are otherwise identical in format to those for the WAIT instruction. The SKIP instruction allows conditional branches to be set up within a Copperlist. The mechanism is slightly quirky, however. Basically, the beam position is compared with the (x1,y1) arguments as for the WAIT instruction, with MASK data also applying identically. If the beam position is greater than or equal to the (x1,y1) argument (MASK notwithstanding) then the Copper skips the next Copperlist instruction, and moves on directly to the instruc- tion following it. Otherwise, instruction execution continues in the normal sequential fashion. Full information on conditional branches using this in- struction is given below. As may be expected, my CAL syntax for the SKIP in- struction is SKIP (x1,y1) MASK (x2,y2) BFD just as for the WAIT instruction. Comments regarding the optional MASK and BFD arguments applying to WAIT apply identically to SKIP. Now, we have the three Copper instructions. A Copperlist is simply a sequential list of these instructions, in machine word format, in memory. As may be expected, the Copperlist MUST be in CHIP RAM. Having read this far, one may wonder about how a Copperlist is ter- minated. A trick is used here. The final instruction in a Copperlist is a WAIT instruction for an impossible beam position, such as WAIT (0,$FE). This condition will never be fulfilled because a horizontal beam position greater than $E4 isn't possible. I personally use the machine words DC.W $FFFF,$FFFE as the end of my Copperlist. When the beam reaches the end of the video pic- ture, the Copper is automatically restarted at the start of the Copperlist (unless you arrange otherwise!). Ok. We have a Copperlist, complete with the impossible WAIT at the end. How do we tell the Copper to execute our Copperlist? The Copper has a set of registers for this. These are: COP1LCH offset $080 COP1LCL offset $082 COP2LCH offset $084 COP2LCL offset $086 COPJMP1 offset $088 COPJMP2 offset $08A The two register pairs, COP1LCH/L and COP2LCH/L, are loaded with the 18-bit CHIP RAM address of the start of your Copperlist (or some other address-see later!). For simple Copperlists, COP1LCH/L alone is used. Having done this, and turned on the Copper DMA, writing any value to COPJMP1 starts the Copper executing your Copperlist. When the Copper reaches the impossible WAIT ins- truction at the end of your Copperlist, it will WAIT until the vertical blank occurs, at which point the Copper will restart at the address loaded into the COP1LCH/L pair. More correctly, COPJMP1 causes the data in the COP1LCH/L pair to be transferred to the Copper's internal program counter, and execution in- itiated. If you have an address in COP2LCH/L, writing to COPJMP2 will cause that value to be loaded instead. For simple Copperlists (no interlace or any conditional branches) the COP1LCH/L and COPJMP1 set are the defaults used, & these will be used by the Copper for restarting the Copperlist execution at the vertical blank interval. NOTE:the Copper needs its 'programs' aligned on a word boundary just like the 68000. In fact, word alignment holds for all custom chip operations - unless I discover any exceptions and document them in this file, treat word alignment as mandatory. Note that the COPxLCH/L register pairs are NOT program counters! The Copper's program counter is NOT directly accessible, and the values stored in the COPxLCH/L register pairs are INITIAL program counter values, needing to be loaded only once under normal circumstances. The values in COPxLCH/L never change once loaded, unless 1) they are changed by the 68000 at some future time (under your control!); 2) the Copperlist contains instructions such as MOVE #value_hi,COP1LCH ;high word of 18-bit value MOVE #value_lo,COP1LCL ;low word of same which causes the Copper to alter them itself. Conditional Branches:now it may already have become obvious to the astute reader how to form a conditional branch in CAL. For those who haven't yet worked out all of the details, here they are. First, one needs to know the absolute address in memory of the point to which to branch. BEFORE the branch is to be executed, reload COP1LCH/L with this new address-the Copper can be made to do it using instructions such as the example above. Then immediately after a SKIP instruction, place the Copper instruction MOVE #0,COPJMP1 and the rest of the Copperlist following this. If the beam position when the Copper reaches the SKIP instruction is greater than the SKIP instruction's position argument, the MOVE above will be skipped, and the remaining inst- ructions of the Copperlist following the MOVE executed. If the position is less than the argument, the MOVE above will be executed, and the Copper will load its internal program counter with the new value of COP1LCH/L that you forced it to earlier on. Hey presto! Conditional branch! Of course, if you wish to leave COP1LCH/L alone, you can use COP2LCH/L instead, and use a MOVE to COPJMP2 to cause the branch instead. WARNING:If you change COP1LCH/L in order to force a conditional br- anch using the above mechanism, REMEMBER TO RESET IT TO POINT TO THE START OF YOUR COPPERLIST AGAIN AFTER THE BRANCH! You have to do this in two places in your Copperlist, once after the MOVE to COPJMP1 (to take account of branch not performed) and once immediately after the branch point (to take account of branch performed). Failure to do this will result in the Copper failing to execute portions of your Copperlist after the first execution! Interlaced playfields:you need two Copper lists for this, one for the long frame and one for the short frame. The long frame Copperlist (the first one) should initialise bitplane pointers to point to the FIRST line of the bitplanes, and the short frame Copperlist should initialise the bitplane pointers to point to the SECOND line of the bitplanes. At the end of the long frame Copperlist, before the impossible WAIT, insert two instructions to set the COP1LCH/L pair to point to the short frame Copperlist. Similarly, at the end of the short frame Copperlist, place instructions to point COP1LCH/L at the start of the long frame Copperlist. The Copper will then alternate back and forth between the two Copperlists. In addition, the bitplane control (see below) needs to have the LACE bit set, and various other instructions need to be executed to ensure proper system synchronisation, and ensure that your in- terlaced playfields are displayed properly. More details in the section below on bitplane control. Note:incorrect setting of the Copper registers can lead to the so- called FIREWORKS_MODE of the Amiga occurring. This occurs when the Copper is pointed to an invalid area of memory, and the Copper tries to execute what it thinks is a Copperlist there. The Copper isn't intelligent (like some COBOL programmers I know) and thinks that anywhere it's pointed to is a Copperlist to execute, and happily runs it. This usually has weird and wonderful effects such as screwing up the screen completely. This phenomenon (the runaway Cop- per syndrome) is the ONE event that can crash the Amiga completely, beyond even a Guru recovery. It's sometimes pretty to watch, but can result in your Amiga containing a fried Agnus or something else fatal if you don't switch off the moment it happens. YOU HAVE BEEN WARNED. Copper interrupt:In the section on interrupt control, it was stated that the Copper has its own interrupt control bit in INTENA/INTREQ. To signal a Copper interrupt, place the instruction MOVE #$8010,INTREQ into your Copperlist at the desired point, and the Copper will force the int- errupt system to generate the Copper interrupt. Any interrupt bit can be set this way, but the above instruction sets bit 4 of INTREQ, specially provided for the Copper. By placing this instruction after suitable WAIT instructions one can tell the 68000 that a given screen position has been reached, and is the recommended method of setting up Raster Interrupts. Amiga Raster Inter- rupts are completely controllable, and can be made to occur not only at a given raster line position, but at a given screen column as well! Using the MASK option in WAIT allows all manner of wonderful tricks to be performed. The only limit is your imagination at this point. Hardware:Bitplanes & Bitplane Control Having discovered how to generate a Copperlist, the next logical step is to learn how to control the bitplane usage. All bitplane control registers are accessible to the Copper (except for those below offset $040), and thus one can set up a Copperlist to set these registers to given values. This is of particular value for interlaced playfields, already mentioned in the Copper section above, but can be performed for any type of playfield if wanted. Bitplane Control Registers:the full list of bitplane control regis- ters is: VPOSR (offset $004) Read MSB of vertical beam position VHPOSR (offset $006) Read vertical/horizontal beam position VPOSW (offset $02A) Write MSB of vertical beam position VHPOSW (offset $02C) Write vertical/horizontal beam pos DIWSTART (offset $08E) set top left corner of display window DIWSTOP (offset $090) set bottom right corner of same DDFSTART (offset $902) horiz. pos start of bitplane DMA fetch DDFSTOP (offset $094) horiz. pos end of bitplane DMA fetch BPL1PTH (offset $0E0) bitplane pointers, high/low, for BPL1PTL (offset $0E2) up to 6 bitplanes as wanted BPL2PTH (offset $0E4) BPL2PTL (offset $0E6) BPL3PTH (offset $0E8) BPL3PTL (offset $0EA) BPL4PTH (offset $0EC) BPL4PTL (offset $0EE) BPL5PTH (offset $0F0) BPL5PTL (offset $0F2) BPL6PTH (offset $0F4) BPL6PTL (offset $0F6) BPLCON0 (offset $100) main bitplane control register BPLCON1 (offset $102) scroll values for outsize playfields BPLCON2 (offset $104) sprite/playfield & DUALPF control BPL1MOD (offset $108) bitplane modulo for odd planes BPL2MOD (offset $10A) bitplane modulo for even planes The main bitplane control register BPLCON0 is organised as follows (bit set equals function enabled, bit clear equals function disabled): Bit Function --- -------- 15 HIRES Turn on high-resolution mode 14 BPU2 These three bits contain the 13 BPU1 number of bitplanes used 12 BPU0 11 HOMOD Hold & Modify mode on 10 DBPLF Dual Playfield mode on 9 COLOR Video output colour (always set this!) 8 GAUD Genlock Audio on 7-4 Unused 3 LPEN Lightpen input active 2 LACE Interlace mode on 1 ERSY External synchronisation on 0 Unused Some restrictions exist. HOMOD and DBPLF cannot both be set simultaneously, one or the other only can be set. Both bits can be cleared, however, and if all six bitplanes are enabled, the hardware automatically selects the EXTRA- HALFBRITE mode. Also, GAUD and ERSY are only useful with a Genlock interface (DO NOT SET THEM UNLESS YOU'RE USING ONE!). The legal value range for the BPUx bits is 0 to 6, 7 is not allowed. Don't ask me why 0 is allowed... Having decided which screen mode is desired, one then needs to set the bitplane sizes. The registers DIWSTART (Display Window Start) and DIWSTOP (Display Window Stop) are used for this. Bits 15-8 contain the vertical pos- ition, and bits 7-0 the horizontal position. DIWSTART is assumed to rest in the top left quadrant of the screen. This is a fairly sensible assumption, after all. Because the vertical pos- ition can be from 0 to 313, which needs 9 bits, the top bit (not specified) is assumed to be zero, giving vertical positions from 0 to 255. Similarly, the missing 8th bit of the horizontal position is assumed to be 0, giving a horizontal position from 0 to 255. DIWSTOP is a little more complicated. It is assumed to lie in the lower right quadrant of the screen (sensible again) and hence the 9th bit of the horizontal position is assumed to be 1, giving horizontal positions from 256 to 448. Because vertical end positions both greater than and less than 255 should be possible, a trick is used. Bit 15 (the 7th bit of the vertical position) of DIWSTOP is inverted to provide the 8th bit, making an end position of 128 to 312 possible. For end positions from 256 to 312, make this bit zero (thus making the hidden 8th bit equal to 1), and for end posi- tions from 128 to 255, make this bit 1 (thus making the hidden 8th bit 0). Also, DIWSTOP should have the horizontal and vertical values PLUS ONE set into it to work properly. Typical PAL values for the screen are TLC (top left corner) coord- inates (129,41), and BRC (bottom right corner) coordinates (448,296). This corresponds to DIWSTART = $2981, and since DIWSTOP should contain the values (449,297) instead of (448,296), DIWSTOP = $29C1 is used. This produces a PAL 320 x 256 display area centred in the middle of the monitor display. Limitations exist on these values. Firstly, monitor tube distortions limit the values (the corners will be cut off if the entire monitor screen area is used), and the blanking gaps need to be taken into account. Vertical blanking gaps occupy lines 0 to 25, making the earliest TLC vertical position 26 lines from the VBL start, and the latest BRC vertical position is 312. The horizontal situation is more complex. The horizontal blanking gap (HBL) lies between colums 30 and 106. Horizontal positions from 107 are possible. We have set the screen mode, and the bitplane size. Now we need to set up the bitplane DMA. The DMA data fetch must start in synchronisation with the start & stop values, to ensure that the pixels appear in the right places on screen. Vertically, this is no problem. Screen DMA starts and ends in synchronisation with the DIWSTART/DIWSTOP vertical positions automatically and no register control of this is provided. Horizontally, this is a problem however. To display a pixel on the screen, the current word needs to be read from each bitplane. for 6 bitplanes, low-resolution, 8 bus cycles are needed. In addition, the hardware needs a half bus cycle before the data can appear on the screen. The bitplane DMA must therefore start exactly 8.5 cycles or 17 pixels before the start of the screen window for low-resolution screens, and 4.5 cycles or 9 pixels before the start of the screen window for high- resolution screens. The registers controlling this data fetch are DDFSTART and DDFSTOP (Display Data Fetch Start, and Display Data fetch Stop). Only bits 7 to 2 are writeable, the others are "don't care" bits and should be set to 0. Bit 2, the lowest writeable bit, should always be 0 for low-resolution screens since the bitplanes are read once every 8 bus cycles, and the values of both DDFSTART and DDFSTOP must be an exact multiple of 8. Regardless of the resolution, the difference between DDFSTART and DDFSTOP (the Amiga System Programmer's Guide is littered with misprints about here!) must always be divisible by 8, since the hardware always divides the lines into sections of 8 bus cycles each. Even in high-res mode, the bitplane DMA is performed for 8 bus cycles beyond DDFSTOP, so that 32 points are always read. Let H equal the horizontal start of the screen, as set in DIWSTART. Also, let P equal the number of pixels per line. The values for DDFSTART and DDFSTOP are thus computed as: Low Resolution : DDFSTART = (H/2 - 8.5) AND $FFF8 DDFSTOP = DDFSTART + P/2 - 8 High resolution : DDFSTART = (H/2 - 4.5) AND $FFF8 DDFSTOP = DDFSTART + P/4 - 8 For our standard PAL window of 320x256 centred as above, we have H=129, P=320 and the values are thus DDFSTART : (129/2 - 8.5) AND $FFF8 = $38 DDFSTOP : $38 + 320/2 - 8 = $D0 For a high-resolution screen of 640x256 centred as above, we now have H=129, P=640, and the values are thus DDFSTART : (129/2 - 4.5) AND $FFF8 = $3C DDFSTOP : $3C + 640/4 - 8 = $D4 DDFSTART cannot be less than $18. This is because the first $18 bus cycles are reserved for the memory refresh, disc and audio DMA, and the DMA channel for sprite 0 (used as the mouse pointer) which cannot be turned off. DDFSTOP is limited to a maximum of $D8 (horizontal blank occurs beyond this!). Bitplane Pointers:the list above of registers includes the bitplane pointers BPLxPTH/L. Each word-sized 'L' register combines with its 'H' coun- terpart to form a pointer into the CHIP RAM. By setting these pointers to the address of the start of each portion of bitplane memory, and then setting the BPLCON0 register for the appropriate screen mode, bitplane control is almost complete. Note that in this case, the bitplane pointer contents are CHANGED by the operation of the system, as opposed to the COPxLCH/L registers above. As a result, the bitplane pointers need to be reset after each use, either by an appropriate Copperlist, or by the 68000 during the VBL. There exist six other registers called BPLxDAT which are accessible only by the DMA system. When a BPLxPTH/L register pair is accessed to obtain a bitplane address, it is inc- remented by two after the requisite data word is accessed and passed on to the BPLxDAT register. Once the full complement of BPLxDAT registers for the given screen are loaded, their data is passed to the display electronics, and the process repeated. Either at the VBL or within the Copperlist, each of the BPLxPTH/L registers need to be reset as a result. The actual reading of the bitplane data occurs during the interval between the occurrence of DDFSTART & DDFSTOP. After DDFSTOP has been reached, the bitplane pointers are changed by the values contained in the BPLxMOD registers, and under normal circumstances these registers are set to zero. The BPLxMOD registers will be covered more fully later on in this section. So far, this information assumes that the playfields are designed to be the same size as the area displayed. It is perfectly possible to design a collection of outsize playfields larger than the display area, and display a portion of each. The playfield can be extra-tall, extra-wide or both, making screen scrolling almost ridiculously easy in comparison with other computers such as the Atari ST. To manage extra-tall playfields is simplicity itself. Simply alter the values of BPLxPTH/L used as the start point for vertical scrolling. If this cannot be done in a Copperlist, use the 68000 during the VBL. One way of making the Copperlist handle it is to write to the Copperlist directly, using the Copper interrupt to signal that the Copper has executed beyond the point at which you wish to write the new addresses into the Copperlist, and performing the write operation during the Copper interrupt. Alternatively one can use the 68000 during the VBL interrupt, storing the true base pointers and the scrolled values somewhere safe beforehand, and updating the scrolled values each time a scroll is performed. Note that I mention using the Copper interrupt to signal that it is safe to write into the Copperlist. If this is not done, it is possible to write into the Copperlist at the same point being accessed by the Copper, and thus ensuring that the Copper gets the wrong data. Use of the Copper inter- rupt ensures that the Copper has genuinely finished with the portion of the Copperlist being rewritten. Managing extra-wide playfields is a little more complicated, but the hardware makes for almost unbelievable speed in pixel-boundary horizontal scrolling. BPLCON2 is used to control the pixel offset from 0-15 (remember, the DMA system accesses bitplanes in 16-bit words, corresponding to 16 pixels of bitplane data). Bits 15-8 of BPLCON2 are unused. Bits 7-4 are used to con- trol the pixel offset for the odd planes, and bits 3-0 are used to control the pixel offset for the even planes. BPLCON2 determines the number of pixels to the LEFT that the screen is scrolled, so for scrolling to the RIGHT, one must use 16-X (where X is the left scroll value) and add 2 to each of the bitplane pointers. Also, to ensure that the extra-wide playfield is properly displayed, there exist modulo registers. Modulo registers are used extensively within the Amiga hardware, particularly by the blitter. A modulo register contains a value to be added on to a pointer register pair value in order to ensure that the pointer points to the correct data word after a series of operations. An example will illustrate. Let us create a 640-pixel low-resolution display. This is twice as wide as the standard display of 320 pixels. After reading the first 20 words of the display (40 bytes), the bitplane pointers are pointing to word 21. In a normal display, the BPLxMOD registers contain zero, and this is added on to the BPLxPTH/L values to reference the next line. For our double-width play- field, this is not correct. We want to skip another 20 words (40 bytes) to reference the second line correctly. This is done by setting BPLxMOD to 40. This is then added on to the BPLxPTH/L pairs by the system and the second line of our double-width playfield is thus referenced correctly by the DMA system. Of course, both can be combined to make a huge display area, the sole limitations being available CHIP memory and your imagination. This can then be scrolled around at will. Note that for smooth scrolling, the scroll values MUST be changed outside the time used for displaying the actual bitplanes. This again is possible using the Copper or the VBL interrupt as above. Basically, scrolling smoothly is achieved by keeping a pixel scroll value saved somewhere as well as the bitplane pointers. To scroll left, take the pixel scroll value, add 1, AND with $0F and save back. If the result is zero, add 2 to all bitplane pointers. Then write these values into the vari- ous BPLCON2/BPLxPTH/L registers. For smooth right scrolling, take the pixel scroll value, subtract 1, AND with $0F, save back. If the result equals $0F then subtract 2 from all bitplane pointers. Write all of the resulting data into the requisite registers. So now you know. Double-Buffering:at this point, any reader having digested both of the sections on Copperlists and Bitplane Control will have all the informa- tion to hand to perform double-buffering in hardware. Set up a Copperlist for the desired screen, complete with bitplane pointer initialisation. I find it useful to refer to the screen within which rendering is performed as the logical screen, and the screen currently being displayed as the physical screen. The double-buffering technique keeps these screens separate. Set up the Copperlist initially to point to one of the screens, which will become the physical screen. Perform all rendering in the other screen, which will become the logical screen, and use the Copper interrupt to determine when it is safe to change the bitplane pointer initialisation section to restart the Copperlist after the VBL with the identities of the two screens changed. The previous logical screen, within which one has rendered all graphic data, will become the new physical screen, and the current physical screen will then be- come the new logical screen. Again, perform all rendering in the logical scr- een. Provided that all rendering can be performed within the time taken to display one frame (1/50th of a second), the resulting motions of graphic en- tities within your program will be completely smooth and flicker-free. This technique requires two sets of screen memory and is thus memory-hungry, but it is the basic technique for most games requiring smooth object motions. I shall add at this point that it may be possible to achieve smooth motion by this means even if it takes up to three frames to perform all rendering, as the movement of objects within 'Strike Force Harrier' on the ST is reason- ably smooth (I once worked for the author of that game) even though frame swapping only occurs at 16 frames per second. With the Amiga's far superior hardware is should be possible to perform similar rendering at 24 frames per second or even faster (Strike Force Harrier has up to 32 bob-type objects on screen at once, hence the time taken for rendering!), and 50 frames per sec- ond animation is perfectly possible with fewer objects to move, unless they are truly huge (but see 'Menace'-some of that program's bobs are of a vast size). With dual playfield mode and oversize playfields, it's even possible to perform fast rendering using the blitter (see later) and perform parallax scrolling or even two-direction scrolling! Hardware:Sprite Management This section has not been thoroughly tested by me for all of the possibili- ties, because I haven't had occasion to use hardware sprites yet. However, a mini-preamble will serve to open up ideas. Denise, the chip responsible for sprite management, is a high-speed sprite processor using its own DMA channels (8 in all). The existence of 8 DMA channels for sprite processing does NOT limit the programmer to 8 sprites as on some lesser systems, and with clever programming it is possible to have up to 72 sprites moving about at once! But before explaining sprite DMA chan- nel reuse, the technique allowing this, the fundamentals should be covered. First, some limitations. A hardware sprite has a maximum width of 16 pixels. It can be any size vertically up to the size of the entire screen if wanted, though usually programmers work with 16x16 sprites or similar. A sprite can be displayed anywhere on the screen, and appears in front of the playfields. The Intuition mouse pointer is a hardware sprite, in actual fact sprite 0, and the DMA allocation within a raster line never allows the time allocated to sprite 0 to be stolen by bitplane DMA. If no other sprites are used, the remaining sprite DMA slots CAN be stolen by bitplane DMA for really wide displays, but it is a good idea not to do this during first experimenta- tion with sprite management. Also, a sprite is normally a 3-colour entity. It is possible to have a 15-colour sprite by combining two sprites together. The restriction here is that the sprites MUST be combined as follows:sprite 0 with sprite 1, sprite 2 with sprite 3, sprite 4 with sprite 5, and sprite 6 with sprite 7. No other order is allowed. Sprite colours are allocated differently for 3-colour sprites and 15-colour sprites. For 3-colour sprites, the allocations are: Sprite No Colour Registers --------- ---------------- 0,1 17,18,19 (16 not used) 2,3 21,22,23 (20 not used) 4,5 25,26,27 (24 not used) 6,7 29,30,31 (28 not used) The unused colours are treated as 'transparent', i.e., the playfield data shows through where 'colour 0' pixels appear in the sprite, the 'colour 0' for each sprite being thought of as corresponding to the unused colour reg- isters. So, if sprite 0 has pixel colours 0,1,2,3, the actual colours used are transparent,17,18,19. Needless to say, colours will only clash with the other graphic objects on 5-bitplane screens, extra-halfbrite or HAM screens in single-playfield mode. If your screen is 4 bitplanes or less, sprite col- ours are independent of the main screen colours. To put sprites on screen, almost all that is required is that the programmer constructs a sprite data list in CHIP RAM, and passes a pointer to the start of the sprite data list to Denise's sprite control registers. Once that has been done, the DMA system handles the sprite all by itself. A sprite data list consists of two control words, followed by the sprite data itself, and then two more control words, which for a standard sprite are zero to tell Denise that no more sprite processing is to be performed using this DMA channel. The initial two control words tell Denise where the sprite is to be displayed, and also if two 3-colour sprites are combined to form one 15- colour sprite. Now the bad news. Allocation of bits in the sprite control words is awkward to say the least. It runs as follows: Control Word 1 : EEEEEEEEHHHHHHHH Control Word 2 : LLLLLLLLA0000ELH The E bits represent the first line of the sprite (called VSTART). Control word 1 contains bits E7-E0 reading left to right, and control word 2 contains E8. Bits E8-E0 make up the VSTART parameter. The H bits represent the horizontal position of the sprite. Control word 1 contains H8-H1 reading from left to right, and control word 2 contains H0. Bits H8-H0 make up the horizontal position parameter, called HSTART. The L bits represent the last line of the sprite plus one. Control word 2 contains L7-L0 in the high byte, reading from left to right, and L8 at bit 1 of the low byte. I told you it was bloody awkward! This value is known as VSTOP. The A bit in the second control word is the ATTACH bit. It tells the sprite DMA system that this sprite is attached to another sprite, and is set if this is the case (BUT ONLY IN THE SPRITE DATA LISTS FOR SPRITES 1,3,5,7!). The comment in the Amiga System Programmer's Guide that these bits are divid- ed somewhat impractically between these two control words is a masterpiece of understatement! Sprite resolution is one low-resolution pixel horizontally, and one raster line vertically, and these values are constant since sprite DMA is in- dependent of the playfield modes. Now, the sprite data list is formed as follows for a single sprite: Control Word 1, Control Word 2 Data Word 1 of L1, Data Word 2 of L1 Data Word 1 of L2, Data Word 2 of L2 Data Word 1 of L3, Data Word 2 of L3 Data Word 1 of L4, Data Word 2 of L4 ... ... Data Word 1 of LN, Data Word 2 of LN Zero Word, Zero Word. The sprite data is treated as being like mini bit planes, the data word 1 corresponding to 'bitplane 1' of the sprite, and data word 2 corresponding to 'bitplane 2' of the sprite. If a given bit in both words is 0, that pixel of the sprite is transparent, else the colour allocation is as given in the 3- colour sprite allocation table above. This preamble goes a long way toward explaining why I haven't bothered with them up to now! Now, if the ATTACH bit is set in sprite 1, this tells Denise that sprite 1 is attached to sprite 0 to make a 15-colour sprite. In this case, the sprite is treated as a '4-bitplane' entity, and the allocations are: Data word 1, sprite 0 : 'bitplane 1' Data word 2, sprite 0 : 'bitplane 2' Data word 1, sprite 1 : 'bitplane 3' Data word 2, sprite 1 : 'bitplane 4' The same applies to the other sprites in combination in ascending numerical order (PHEW!). At this point, I warn the reader that if the sprite positions of attached sprites do not match, Denise treats them as two separate sprites anyway. POSITIONS OF ATTACHED SPRITES MUST BE IDENTICAL FOR THEM TO BE TREA- TED AS ATTACHED SPRITES! The colour allocations for a 15-colour sprite are transparent, then all colour registers from 17 to 31 upwards, according to the value extracted from a given bit position in each sprite data word. If, for example, bit 4 of each sprite data word is 1,0,1,1 in the order above, this corresponds to a colour value of %1101 or 13, and colour register 29 provides the colour value for this pixel of the sprite. The colour register is 16+pixel value, unless the pixel is %0000, in which case it's transparent. Again, PHEW! Sprite DMA channel reuse:I mentioned earlier that it was possible to display many sprites using the phrase 'sprite DMA channel reuse'. This means that the two end control words of the sprite are not zero. To reuse a sprite DMA channel, append the entire sprite data list of a second sprite onto the first, replacing the zero control words of the first sprite with the starting control words of the second. Again, if no more sprites are to be displayed, the final control words of the entire list are zero, else the procedure of appending a sprite data list continues for as many sprites as required, bear- ing in mind an important limitation:there must be at least one raster line between sprites thus appended into a reuse list, to give the DMA time to read in the new control words. Ok, what if you don't want to use all 8 sprites? well, turning on sprite DMA activates all 8 sprite DMA channels, and so the unused ones must be passed a pair of zero control words to render them inactive. One can use the existing zero control words at the end of some genuine sprites for this purpose. Now for the important part. You have created your sprite data lists and want to see them activated. Write the address of the start of each of your sprite lists to the SPRxPTH/L register pairs. The offsets for each of these registers are: SPR0PTH : offset $120 SPR0PTL : offset $122 SPR1PTH : offset $124 SPR1PTL : offset $126 SPR2PTH : offset $128 SPR2PTL : offset $12A SPR3PTH : offset $12C SPR3PTL : offset $12E SPR4PTH : offset $130 SPR4PTL : offset $132 SPR5PTH : offset $134 SPR5PTL : offset $136 SPR6PTH : offset $138 SPR6PTL : offset $13A SPR7PTH : offset $13C SPR7PTL : offset $13E This can be done using the Copperlist as might be expected, or the hard way using the 68000. In any case, initialisation of all of these pointers MUST be performed in the vertical blank interval if sprite DMA is enabled even if the registers are pointed at zero control words to disable them. Furthermore the values stored in these registers change during sprite DMA usage, and so every time the vertical blank interval occurs, the SPRxPTH/L registers must be re-initialised, either by the 68000 or using a Copperlist. Moving the sprites by changing the position data in the initial con- trol words must also be performed during the vertical blank interval to en- sure that Denise receives the correct data, else your sprites could jump all over the screen in a weird and wonderful fashion! You can use the Copper int- errupt to signal that it's safe to change them if using a Copperlist to init- ialise the SPRxPTH/L values, by ensuring that the change only occurs AFTER the Copper has initialised the registers. Upon initialisation, the control words are read immediately & held for processing until the correct beam pos- ition has been reached for displaying them, and once read, the values in the sprite data lists can be changed safely. Sprite and Playfield Priority:Having introduced the reader to the hellish delights of the Sprite management system's basic control registers, I now wish to make life even more complex by introducing sprite/playfield pri- ority allocation. First of all, the lower the sprite number, the higher the priority. This means that sprite 0 hs higher priority than sprite 1, etc., and that the sprites are displayed as though they were on separate planes, the plane for sprite 0 being in front of the other sprite planes. In actuality, the sprite priorities are grouped into the same pairs as for sprite attachment, particu- larly when the playfields are brought into the whole picture. So, considering the sprites as paired for priority purposes, in the same manner as for sprite attachment, a playfield can be arranged in order of priority according to the table below. In this table, P represents the playfield, and the digit pairs 01, 23, etc., represent the sprite pairs. All possible combinations are given below, the element to the left of the pri- ority arrangement entry list being that with the highest priority. Playfield Pos Priority Arrangement ------------- -------------------- 0 P 01 23 45 67 1 01 P 23 45 67 2 01 23 P 45 67 3 01 23 45 P 67 4 01 23 45 67 P Now, if only one playfield is selected, then this table holds for that play- field. If dual playfield mode is selected, then this table holds for each of the playfields INDEPENDENTLY (with a few limitations). The selection of the priorities is controlled by BPLCON2 (offset $104), whose bits are allocated as follows: Bit Function --- -------- 15-7 Unused 6 Playfield Relative Priority 5-3 Priority of Playfield 2 rel. to sprites 2-0 Priority of Playfield 1 rel. to sprites If bit 6 of this word is set in dual-playfield mode, playfield 2 is deemed to have higher priority than playfield 1, else playfield 1 has priority over playfield 2 (the usual state of affairs). Bits 5-3 determine the priority of playfield 2 relative to the sprites, and the 3-bit value to insert here is the value in the sprite/playfield priority table above labelled 'Playfield Pos', corresponding to the given sprite/playfield priority in the table. In the same way, the 3-bit value for bits 2-0 determining the priority of play- field 1 relative to the sprites is chosen from the above table. Now, since the two playfields have a relative priority, and each of the playfields has its own independent priority relative to the sprites, it is a fair question to ask whether the playfields' priority relative to each other has precedence over their priority relative to the sprites. The answer is YES. In the Amiga System Programmer's Guide, an example is given for the value BPLCON2 = $0003. Here, bit 6 is zero, so playfield 1 should be in front of playfield 2. Bits 5-3 are zero, so playfield 2 should appear in front of all sprites from 0-7. Bits 2-0 have the value 3, meaning that playfield 1 is in front of sprites 6 & 7, and behind all of the others. A quick glance at this description shows something amiss. Playfield 2 cannot be in front of all sprites and at the same time behind playfield 1 (which is behind sprites 0 to 5). When one of the sprites 0-5 is BETWEEN playfields 1 and 2, it appears in front of playfield 1, according to its priority. Since this is in front of playfield 2, the sprite is visible at this point, although it must actually be behind playfield 2. If only playfield 2 and the sprite are at a given pos- ition, playfield 2 covers the sprite because of its priority. In single playfield mode, the bit 6-3 have no function, and should be set to zero. Bits 2-0 still control sprite priorities, and the position of the single playfield relative to the sprites. Sprite Collision:those readers wishing that they had never bothered with sprite handling after reaching this point due to the complexity of the sprite management system are about to burst into tears over sprite collision. Firstly, let us overview the basic principles of object collision. The fundamental principle of graphic element collision is this:when two graphic elements overlap at a screen position, and both objects have a set pixel at the same screen position, this is treated as a collision bet- ween the two graphic elements. More sophisticated collision algorithms for certain purposes do exist, but these will be ignored here, as they are not implemented in the Amiga hardware, as will the coordinate comparison algo- rithm (which is simpler still in some respects and very quick if speed is of the essence). When a collision between graphic elements occurs on the Amiga, it is signalled by setting a bit in the CLXDAT register (offset $00E), which is a read-only register from the 68000's point of view (only the sprite manage- ment and blitter DMA can write to this register). The bit allocations of CLXDAT are as follows: Bit Function --- -------- 15 Unused 14 Sprite 4/5 collides with sprite 6/7 13 Sprite 2/3 collides with sprite 6/7 12 Sprite 2/3 collides with sprite 4/5 11 Sprite 0/1 collides with sprite 6/7 10 Sprite 0/1 collides with sprite 4/5 9 Sprite 0/1 collides with sprite 2/3 8 Playfield 2 collides with sprite 6/7 7 Playfield 2 collides with sprite 4/5 6 Playfield 2 collides with sprite 2/3 5 Playfield 2 collides with sprite 0/1 4 Playfield 1 collides with sprite 6/7 3 Playfield 1 collides with sprite 4/5 2 Playfield 1 collides with sprite 2/3 1 Playfield 1 collides with sprite 0/1 0 Playfield 1 collides with playfield 2 The rules for collision detection are that any non-transparent sprite pixel can cause a collision to be registered. However, it is possible to decide by appropriate programming to choose which playfield bitplanes are used in the determination of collision detection. Also, it is possible to include or ex- clude any odd-numbered sprite from collision detection. Which graphic ele- ments are used for collision detection purposes is decided by programming the CLXCON register (offset $098), which is a write-only register from the point of view of the 68000. The bit allocations for CLXCON are as follows: Bit Function --- -------- 15 Enable collision detection, sprite 7 14 Enable collision detection, sprite 5 13 Enable collision detection, sprite 3 12 Enable collision detection, sprite 1 11 Use bitplane 6 for collision detection 10 Use bitplane 5 for collision detection 9 Use bitplane 4 for collision detection 8 Use bitplane 3 for collision detection 7 Use bitplane 2 for collision detection 6 Use bitplane 1 for collision detection 5 Bitplane 6 collision match bit 4 Bitplane 5 collision match bit 3 Bitplane 4 collision match bit 2 Bitplane 3 collision match bit 1 Bitplane 2 collision match bit 0 Bitplane 1 collision mask bit The first problem the programmer encounters is that collisions between adja- cent numbered sprites used for sprite attachment cannot be performed. Thus a collision between sprites 0 and 1, for example, will not be registered in the CLXDAT register. Collisions between sprite 0 and any sprite from 2 to 7, or between sprite 1 and sprites 2 to 7, will be registered if the appropriate control bits are set. If the bit to enable sprite 1 collision detection is cleared, only sprite 0 collisions between other sprites and/or the playfields will be reported. If the bit is set, then BOTH sprite 0 AND sprite 1 collis- ions will be reported, and furthermore reported in the same CLXDAT bit! Thus choice of sprites needs to be handled carefully if distinctions between the sprites are important for collision purposes, because in the example I have just cited, sprites 0 & 1 use the same bit of CLXDAT for collision detection and thus telling them apart is impossible by merely scanning CLXDAT. If two sprites have been combined into a single 15-colour sprite using the ATTACH bit, the corresponding bits for odd sprite collision detection must be set in CLXCON in order for collision detection to be performed correctly. For the playfields, the level of control is much more complete. If a given bit is set in the CLXCON register above from bits 11 to 6, the relevant bitplanes will be used in collision detection. Bits 5 to 0 are called Match Bit Plane Value bits, and are used to determine what values to use for the comparison before reporting collision detection. Let us assume that we are using 6 bitplanes, and that the BPLCON0 value has enabled all 6 bitplane DMA channels (see the Bitplane Control sec- tion above). If one of the bits 11 to 6 (called the Enable Bitplane bits) is set, that bitplane is used. The corresponding match bit (bits 5 to 0) in the CLXCON register is then used to compare with the given pixel. Let us ass- ume that the Enable bit for bitplane 3 (bit 8 of CLXCON) is set. If the value of the pixel data on bitplane 3 at the collision detection point MATCHES the value of the match bit for bitplane 3 in CLXCON (bit 2) then a collision is reported. So if the pixel bit is 0, and the match bit is 0 also, the collis- ion is reported, as in the case when the pixel bit is 1 and the match bit in CLXCON is 1. If we don't care about a particular bitplane (say for example we wish to ignore bitplane 1 altogether for collision detection), clear the En- able bitplane bit for bitplane 1 (bit 6). Now, the value of bit 0 of CLXCON doesn't matter-the collision will be reported regardless of the state of the pixels on bitplane 1. The table given in the Amiga Systems Programmer's Guide is repro- duced here for those who want it. It correlates directly with the above des- cription of bitplane selection & collision detection control. The Enable Bit- plane bits are referred to as ENBPx, the Match Bitplane Value bits as MVBPx. The 'xx' bits in the table below are "don't care" bits-they can be either 0 or 1. ENBPx MVBPx Collision possible with bit pattern ----- ----- ----------------------------------- 111111 111111 111111 only 111111 111000 111000 only 111100 1111xx 111100, 111101, 111110, 111111 only 011111 x00000 000000, 100000 only 000000 xxxxxx Any bit pattern!!! Take note, that if fewer than 6 bitplanes are used, the ENBPx bits for the unused bitplanes MUST be set to zero! Needless to say, if the colours are chosen suitably, various colli- sion strategies based upon colour can be constructed, as well as strategies based directly upon bitplane management. It is possible, for example, to set sprite collision to register with only red and green pixels of a playfield, or collision with the transparent points of playfield 1 to register only if the underlying pixels of playfield 2 are black. Spurious Sprite Video Data:occasionally software that directly con- trols the hardware suffers from the appearance of a line down the screen at some point. Analysis usually (but not always) shows that this line corres- ponds in position to the position of the CLI sprite pointer prior to swit- ching off sprite DMA. If this is done before the sprite management system has finished displaying the CLI sprite pointer, then when sprite DMA is turned off, the sprite management system cannot read the end control words, and thus continues displaying the sprite data onscreen. Even when sprite DMA is re- enabled, the contents of the sprite data pointers may not point to zero con- trol words, in which case spurious sprite video data may continue to appear until the sprite data pointers are changed. There are two ways to prevent this. The first technique is to point the sprite data pointer registers at zero control words. The second technique (thanks to Count Zero) is to wait for the electron beam to reach a position beyond the maximum possible display position of the CLI sprite pointer, and then turn off the sprite DMA (use a raster line value of 300 for PAL Amigas). Other Sprite Registers:there exist other sprite data registers, that are normally accessed by the sprite management system DMA alone. These regis- ters can be accessed in software by the 68000 also, and these registers are: SPR0POS (offset $140) SPR0CTL (offset $142) SPR0DATA (offset $144) SPR0DATB (offset $146) ... ... SPR7POS (offset $178) SPR7CTL (offset $17A) SPR7DATA (offset $17C) SPR7DATB (offset $17E) The registers for SPRxPOS onwards occupy the entire range of offsets from $140 to $17E, in ascending sprite number order. When the programmer assigns sprite management control to the stan- dard sprite management DMA channels, the sequence of events is: 1) DMA system loads two control words into SPRxPOS (control word 1) and SPRxCTL (control word 2). 2) DMA system turns off sprite output. 3) DMA controller waits for the electron beam to reach the value in the VSTART portion of the sprite control words. 4) Once this position is reached, data words are written into SPRxDATA and SPRDATB. 5) DMA controller turns on sprite output again, and the values in SPRxDATA and SPRxDATB are used for the current raster line. These are positioned according to the HSTART value. 6) DMA controller continues reading data into SPRxDATA/B and displaying it until the VSTOP value is reached. 7) The DMA controller reads the two control words at the end of the sprite data list. If these are non-zero, the sprite data channel is being re-used, and the sequence of events begins again at 1). 8) If the DMA system encounters the two zero control words at the end of the sprite data list, the DMA controller turns off sprite data output on this DMA channel until the vertical blank interval occurs, at which point the sprite DMA begins its display sequence at 1) again. If the programmer uses the 68000 to access these registers, normally used solely by the DMA controller (but accessible to the 68000), then there are a few changes to take note of. First, the sprite data pointer registers need to be initialised as for sprite management via DMA-this remains the same. The sprite data list contents change somewhat, however. If the 68000 is used to load the SPRxPOS/SPRxCTL registers, then only the HSTART value in the sprite data list control words, plus the value of the ATTACH bit, need to be valid. VSTART and VSTOP are used only by the DMA controller. Sprite data output can then being by writing data to the SPRxDATA/B registers. Write to SPRxDATB first, as writing to SPRxDATA causes the data from both registers to be output to the screen. Note that if the DMA control- ler is bypassed in this way, that fresh data needs to be supplied for each line of the sprite to the SPRxDATA/B registers if needed (normally this is performed by the DMA controller) unless all that is wanted is a solid column of identical pixel data. To turn off the sprite again, simply write some value to SPRxPOS. I suggest writing the value zero. Hardware:The Blitter The blitter is THE chip that makes the Amiga so special, and it will come as no surprise to realise that this section will probably be the largest section in this file. It has the capacity to move data at a peak speed of 16 million pixels per second, perform logical operations upon its data sources before generating its output, and is used for three principal functions: 1) Transferring rectangular graphic data blocks to screen bitplane memory (with logical operations to change the plotting method) 2) Drawing lines between any two points on screen 3) Filling bounded areas (taking account of some restrictions) to create filled polygon shapes This is not its entire repertoire, however. The blitter contains all of the on-chip logic necessary to perform vector-arcitecture mathematics, but is prevented from doing so directly by its hard-wired design. However, it is possible to make the blitter perform high-speed computational functions on large blocks of data in one go, provided that one is familiar with Boolean Algebra and has at least a first-year university level grounding in formal logic up to the level of alternational normal schemata (I recommend as THE definitive text on formal logic for those interested to be 'Methods In Logic' by Willard Van Ormand Quine - ask for the Library of Congress record number in preference to the ISBN number as it's an American publication). Having whetted the appetite, now comes a small amount of bad news. The blitter is powerful, but its power has associated with it a certain am- ount of complexity. In particular, although it has a well-defined set of registers for function control, the bit allocations of the BLTCONx control registers (see register list below) change dramatically with each function. There are also several rigid conventions to follow, otherwise the blitter may just scribble at high speed all over critical program memory, as once started up, it cannot be stopped halfway. Blitter Register List:the list of registers associated with the blitter is: Register Offset Function -------- ------ -------- BLTDDAT 000 Blitter Data D (read-only!) BLTCON0 040 Blitter Control Register 0 BLTCON1 042 Blitter Control Register 1 BLTAFWM 044 Blitter A first word mask BLTALWM 046 Blitter A last word mask BLTCPTH 048 Source C data pointer high word BLTCPTL 04A Source C data pointer low word BLTBPTH 04C Source B data pointer high word BLTBPTL 04E Source B data pointer low word BLTAPTH 050 Source A data pointer high word BLTAPTL 052 Source A data pointer low word BLTDPTH 054 Destination D data pointer high word BLTDPTL 056 Destination D data pointer low word BLTSIZE 058 Controls data size/starts blitter BLTCMOD 060 Source C modulo register BLTBMOD 062 Source B modulo register BLTAMOD 064 Source A modulo register BLTDMOD 066 Destination D modulo register BLTCDAT 070 Blitter Source C Data BLTBDAT 072 Blitter Source B Data BLTADAT 074 Blitter Source A Data Some of these registers are not accessed during standard blitter usage. They do come into play for some of the less well-publicised functions for which the blitter is used. More of this later. Rectangular Data Block Moving:this is the first blitter function to be dealt with in this section, and a preamble will help introduce certain key concepts. The blitter, when operating in data copy mode, takes data from up to three different source areas of memory, combines them using a logical opera- tion, and then writes the data out to the destination memory area. The main use of this function is for copying large blocks of graphic data to bitplane memory for display, or copying portions of bitplane memory to safe off-screen areas for background saving. In general, the source memory is linearly organised, and the graphic data occurs in sequential words in memory. At this point I stress that the blitter is a WORD-based device, and that all of the blitter's activities are based upon word-aligned memory, just like the 68000's program access. Return- ing to the graphic data, this linear organisation means that the data can be read sequentially without any problems. However, to illustrate the blitter methodology, let there exist a low-resolution screen of one bitplane, and some graphic data that is to be written to the screen. The low-resolution bitplane is 40 bytes across, or 20 words, and the graphic data has a maximum width of 48 pixels (3 words). The pointer to the word on the screen bitplane where the data is to be put is written to the BLTDPTH/L registers, and the pointer to the graphic data is written to the BLTAPTH/L registers. For now let us ignore other sources and the actual details of the logical operation used. The blitter functions by using the address pointers to fetch a data word from each source (here we are using one source only, source A), and after processing it internally, uses the pointer to the destination to write the data to the screen. It then inc- rements the pointers by 2 to access the next word. So for each blitter oper- ation, the pointers MUST be re-initialised (unless they happen to end up pointing to memory areas to be referenced by another blitter operation, in which case the blitter can simply be started up again). For the graphic data, this poses no problem, since it is sequential- ly organised in memory. But after writing 3 words of data, the blitter must have a correction added to the destination pointer to point to the next ras- ter line to write to. This correction is called the modulo, and is stored in the modulo register for the appropriate source and destination. In this exam- ple, the blitter needs to have a correction of 17 words added to the destina- tion pointer to reference the screen memory correctly, or 34 bytes. Thus the modulo for the destination, BLTDMOD, is set to 34. The modulo for source A, BLTAMOD, is set to zero. If other sources are used, the appropriate modulo must be selected. If source B is used as a graphic data mask organised in the same way as the actual pixel data, its modulo is again zero. If the source C is used as a reference to the screen for background masking-in, its modulo must be the same as the destination modulo, i.e., 34 in this example. So, the organisation of source and destination memory needs to be analysed before setting up both the BLTxPTH/L registers, and the BLTxMOD reg- isters. This information should be sufficient to cover initialisation of the registers just mentioned. Now we need to decide how much data to transfer. The blitter accepts this information as a single word, coded as follows: HHHHHHHHHHWWWWWW The H bits represent the height of the graphic data. This can take any value from 0 to 1023 lines. Zero is taken to mean the maximum size of 1024 lines, as a value of zero lines is otherwise silly. The W bits represent the width of the graphic data in memory words, in our example this is 3. So if our data is 40 raster lines deep, the value of the H bits is 40, and the value of the W bits is 3. This gives a final value of (40 * 64) + 3 = $0A03 This value is written to the BLTSIZE register in the table above. The method I use for computing BLTSIZE is something of the order of move.w rows(a0),d0 ;no of raster lines and.w #$3FF,d0 ;ensure value is 0-1023 lsl.w #6,d0 ;shift move.w cols(a6),d1 ;no of WORDS across! and.w #$3F,d1 ;ensure value is 0-63 add.w d1,d0 ;add it in move.w d0,BLTSIZE(a5) ;here a5 contains $DFF000... However, writing to the BLTSIZE register starts the blitter! So this must be the LAST operation performed upon the blitter registers for a given blitter operation. Now we need to consider logical operations. It is of great assist- ance if the programmer has a thorough grounding in Boolean Algebra at this point, since the method used by the blitter relies heavily upon this. The terminology used for each logical operation selected is 'minterm', which is short and sweet. The formal technique for deriving minterms labours under the unfortunate name of 'developed alternation normal schema formation', a fair mouthful for anyone to handle. The simple way of thinking about this is to remember that the blit- ter has 8 possible logical operations hard-wired into it. These operations are: ABC : A AND B AND C ABc : A AND B AND (NOT C) AbC : A AND (NOT B) AND C Abc : A AND (NOT B) AND (NOT C) aBC : (NOT A) AND B AND C aBc : (NOT A) AND B AND (NOT C) abC : (NOT A) AND (NOT B) AND C abc : (NOT A) AND (NOT B) AND (NOT C) The next table shows which of the operations produce a 1 (true) bit output dependent upon the values of the input bits: Operation A B C LFx Bit No. --------- - - - ---------- ABC 1 1 1 7 ABc 1 1 0 6 AbC 1 0 1 5 Abc 1 0 0 4 aBC 0 1 1 3 aBc 0 1 0 2 abC 0 0 1 1 abc 0 0 0 0 When the blitter processes its data from each source, A,B,C, it feeds the data into circuits whose outputs are each of the 8 logical operations above. These are then ORed together to produce the final result sent to the dest- ination memory. Which ones are selected to OR together are under the control of the programmer. The LFx bit numbers in the table above are used to select them, and a byte containing the approriate bits set selects the given logical operation. So, to select ABC+ABc+AbC+Abc, one uses the select byte $F0 (or, %11110000 in binary). But how do we select them? In our example, we want the destination to match the source input. In other words, D = A. But since B and C are al- ways present in the blitter, how do we introduce them? Well, we don't care about the state of B, so we want operations containing both AB and Ab. In a like manner, we don't care about C, so we want operations containing AC and Ac. This is the same as performing the Boolean Algebra operations (here, + is taken to mean OR, and AB is taken to mean A AND B): A(B+b) = AB + Ab A(C+c) = AC + Ac But we want terms of the form ABC etc. Well, take the first expression A(B+b) = AB + Ab and append the (C+c) used in the second: A(B+b)(C+c) = AB(C+c) + Ab(C+c) = ABC +ABc + AbC + Abc This just happens to be the example encoded as the select byte $F0 above. The principle is the same throughout. Some more examples are: Invert Graphic Data : D = a D = a = a(B+b)(C+c) = aB(C+c) + ab(C+c) = aBC + aBc + abC + abc Which is encoded as the select byte $0F. OR in a graphic into the bitplane : D = A + C D = A + C = A(B+b)(C+c) + C(A+a)(B+b) = AB(C+c) + Ab(C+c) + CA(B+b) + Ca(B+b) = ABC + ABc + AbC + Abc + CAB + CAb + CaB + Cab = ABC + ABc + AbC + Abc + ABC + AbC + aBC + abC = ABC + ABc + AbC + Abc + aBC + abC Which is encoded as the select byte $FA. Note that where identical terms appear in the expression above, surplus ones are simply deleted from the expression. The last example I shall give is the so-called 'cookie-cut' opera- tion. This name originates from the way cookie biscuits are cut from the biscuit mixture when making chocolate chip cookies, familiar to any American especially one who has encountered 'Girl Ranger' cookies. This operation is one where the data A is masked first. If the data A is 1 at this point, we want the masked data to be written. If the data A is 0, we want it to be re- garded as transparent, and hence the background to show through. This allows the mask to create opaque 0 pixels within the data, and any 0 pixels in the A data to be regarded as transparent by having the corrsponding mask pixel set to 1. The operation becomes: Cookie cut : D = AB + aC D = AB(C+c) + a(B+b)C = ABC + ABc + aBC + abC which encodes as the select byte $CA. In this case, the mask will be a nega- tive image (photographically speaking) of the graphic data except where any transparent pixels are required. So, we now know how to select which memory areas to transfer, how to set modulo values for trasnfer to bitplanes, how to determine the data size (and also start the blitter), and select the appropriate logical operation. I now wish to intoduce the blitter control registers. These affect the manner in which the blitter works. The two blitter control registers, BLTCON0 and BLTCON1, have control bits allocated according to the following tables: Bit BLTCON0 Function --- ---------------- 15-12 ASH3-0 : contain the source A shift distance (see later) 11-8 USEA-D : select which sources/destination are used 7-0 LFx : logical function selection bits mentioned above Bit BLTCON1 Function --- ---------------- 15-12 BSH3-0 : contain the source B shift distance (see later) 11-5 Unused 4 EFE : Exclusive Fill Enable 3 IFE : Inclusive Fill Enable 2 FCI : Fill Carry In 1 DESC : Descending mode 0 LINE : Turn on Line Drawing Mode In the case of BLTCON0, the LFx bits that select the minterms have already been covered. This leaves the USEA-D bits and the ASH3-0 bits. The USEx bits determine which sources/destination are used. If a given USEx bit is set, the DMA control system for that source (or destin- ation) is fully enabled, in which case the blitter operation proceeds norm- ally as described earlier. If a source is not being used at all, clear the appropriate USEx bit. This has the effect of disabling the full effects of the DMA channel, but NOT of stopping data transfer altogether. Because of this, minterms have to be chosen to effectively ignore the given source, as well as clearing the given USEx bit. Instead, the same word contained in the BLTxDAT registers for the given source is read continuously and not up- dated. This can be used to fill memory with a given value, as follows: move.w #value,BLTADAT(a5) ;value to fill move.w #$01F0,BLTCON0(a5) ;minterms D=A, USED only move.w #0,BLTCON1(a5) ;mode = copy data lea memory_to_fill(pc),a0 ;start of memory area move.l a0,BLTDPTH(a5) ;point Blitter at it move.w #size,BLTSIZE(a5) ;how much to fill & startup! If copying graphic data, pick the USEx bits carefully. Ensure that if you ARE using a given source, set it's USEx bit. And whatever you do, don't forget to set USED to enable the destination output (the number of programmers who have forgotten to set USED at some time doesn't bear thinking about) or else the blitter won't be able to generate the desired output! The ASH3-0 bits are used to 'fine-tune' the blitter data positioning for graphic output. Normally, the blitter can only output its data to a word boundary, being a word-oriented device. To enable pixel-boundary data plot- ting, the blitter has the ability to shift input data before outputting it. The ASH3-0 bits contain the number of bit positions to shift the data from source A to the right before outputting it. The bits BSH3-0 in BLTCON1 have an almost identical function, this time affecting source B. Hence source A is usually chosen to be the graphic data, and source B any mask data used. If background data is used for transparency generation, source C is generally the preferred choice. Both graphic and mask are shifted before output. Moving on to BLTCON1, for data copying, set all bits other than the BSH3-0 bits to zero. These bits are used for line drawing mode and boundary- filling mode, covered later. To complete the register list, there are two mask registers for the data source A. These are BLTAFWM (Blitter source A first word mask) and BLTALWM (Blitter source A last word mask). If these registers are both zero, the first and last word of each raster line of copied graphic data will be zeroed out-they are used as filter masks for the left and right edges of a graphic data block. If the graphic is one data word wide, the two registers operate upon the same source A data word, and only those source A bits that have the corresponding bits set in BOTH mask registers are allowed through unchanged (otherwise they are treated as zero). For wider graphic data blocks the mask registers mask the end words of the data. As yet, I have not had time to establish conclusively whether source A masking is performed BEFORE or AFTER source A data shifting under BLTCON0 control, and as the result is critically dependent upon this, I suggest experimentation before assuming one way or the other. As a final note concerning the blitter in data copy mode, I said above that the remaining BLTCON1 bits other than the BSH3-0 bits should be cleared. Normally this is the case (using the blitter for graphic data manip- ulation), but if the blitter is used to move memory blocks, particularly if the data blocks are overlapping, then it is time to consider setting the DESC bit. The DESC bit of BLTCON1 controls whether the address incrementing of the BLTxPTH/L registers is positive (ascending mode, DESC=0) or negative (de- scending mode, DESC=1). An example illustrates the point. Let us copy a block of memory of size N bytes (N will have to be even for blitter copy), starting at address X, to a new location at address Y. If X lies lower in memory than Y, but the difference between X and Y is less than N bytes, then the initial copy operation will erase some of the data at the end of the block to be cop- ied. To prevent this, it is possible to copy backwards by setting DESC=1. If this is done, however, the pointer registers should be set to addresses X+N and Y+N instead of addresses X and Y, because copying will start at the END of the blocks. NOTE:until recently, I assumed that the choice of Source A for the graphic data, Source B for the mask, and Source C for the background when a masked blit was performed was OK. This has transpired to be incorrect. Source channel selection SHOULD be: Source A = MASK Source B = DATA Source C = BACKGROUND If this order of selection is made, then the standard minterms mentioned in the Amiga Hardware Reference Manual hold true, e.g., $CA is the 'cookie-cut' minterm. See later for an example. Line Drawing:the blitter has inbuilt line-drawing logic which was added by its designer, Jay Miner, after discovering that the other functions left space on the silicon for the line-drawing hardware (wow-history lesson too!) and decided that this would be a welcome feature to include. The problem with the blitter's line-drawing logic is that it uses a method unfamiliar to anyone having no experience of hardware geometry engines (a generic term used to describe high-speed graphics-specific processors). A line drawn with the blitter has to be described in a manner conforming to the method used. I shall try to make this simple, but if it seems hard going, I implore you to persevere-this imformation will apply to far more sophistica- ted geometry engines as well and hence has a wider application. Lines are represented under normal circumstances using the start and end coordinates. If the end points of the line being drawn are P (x1,y1) and Q (x2,y2), this is usually sufficient information. The alternative informa- tion used by the blitter and other geometry engines is 1) the address of the memory word in the bitplane where the start point P lies; 2) the number of points that the line will occupy once drawn; 3) the angular orientation of the line represented in terms of which 45 degree compass segment (or octant) that the said angle lies in (measured anticlockwise from zero degrees from the X-axis:in this system of measurement 90 degrees is due North, 180 degrees is due West, 270 degrees is due South etc). These compass segments, or oct- ants, are defined using octant numbers, allocated according to the table: Angle Range Octant No. ----------- --------- 0-45 degrees 0 45-90 degrees 1 90-135 degrees 2 135-180 degrees 3 180-225 degrees 4 225-270 degrees 5 270-315 degrees 6 315-360 degrees 7 Note that this table assumes that the Y-axis is drawn to be positive when pointing DOWNWARDS, thus setting up the usual screen coordinate system where (0,0) is the TOP LEFT CORNER of the screen. The octant information is insufficient in itself, however. Before I explain the relationship between the octant numbers and the data actually sent to the blitter, I shall give the equations describing the line that are used by the blitter. These are : dX = X2 - X1 These are used to determine dY = Y2 - Y1 which octant the line lies in DX = ABS (dX) DX = Delta X DY = ABS (dY) DY = Delta Y DS = MIN(DX,DY) DS = Delta S (smaller Delta) DL = MAX(DX,DY) DL = Delta L (larger Delta) These expressions form the basis of geometry engine line drawing (including that of the blitter, even if it cannot be regarded as a true geometry engine in the same manner as, for example, the Weitek 1164/1165 series used in the 1167 accelerator board for the Compaq DeskPro 386/20 PC, or the TMS34020). Now for the hard part. Delta X, Delta Y and Delta S/L are used to determine which octant should be selected for the line. This is done by the conversion of the octant number into a 3-bit number, which corresponds to three bits in the BLTCON1 register as used for line drawing control. How the bits in BLTCON1 are allocated changes when line drawing is performed. The bit allocations are: Bit No Name Function ------ ---- -------- 15-12 TEXTURE3-0 Value for mask shifting (see below) 11-7 Unused Always set to zero 6 SIGN Changes line drawing direction 5 Unused Always set to zero 4 SUL Sometimes Up or Left 3 SUD Sometimes Up or Down 2 AUL Always Up or Left 1 SING Singular bit (see below) 0 LINE Always set to 1 for line drawing The SUD/SUL/AUL bits are set or cleared according to the following values: Octant No. SUD SUL AUL --------- --- --- --- 0 1 0 0 1 0 0 0 2 1 1 0 3 0 0 1 4 1 0 1 5 0 1 0 6 1 1 1 7 0 1 1 and the SIGN bit is set if the computed value of (2 * DS) - DL is less than zero (to change the direction in which the line is rendered for lines with a negative slope). Under normal circumstances, the USEA-D bits in BLTCON0 should be set to the values: USEA = 1 USEB = 0 USEC = 1 USED = 1 and the minterms set to $CA. The value of the ASH3-0 bits is set to the value x1 MOD 16 to determine which bit in the start word is the start bit of the line, and in most literature on blitter line drawing these bits change their name to the START3-0 bits. The line can be drawn using a mask, to provide dotted lines accord- ing to a line dot pattern. This mask is written to BLTBDAT (see register list above), and for solid lines the value to use is $FFFF. A value of $AAAA or $5555 produces a finely dotted line, a value of $CCCC a more coarsely dotted line. So, to give the blitter register initialisation values. These are as follows (noting the values DX, DY, DS and DL above): BLTCPTH/L, BLTDPTH/L : Put the start address of the first point of the line in these registers. BLTCMOD, BLTDMOD : Number of bytes making up one raster line of the bitplane within which the line is to be rendered. For a low-res bitplane this is 40. See example code below. BLTBMOD : Set to 2 * DS. BLTAMOD : Set to (2 * DS) - (2 * DL). BLTAPTL : Set to (2 * DS) - DL. BLTADAT : Set to $8000 BLTBDAT : Set to your chosen pattern mask BLTAFWM : Always set to $FFFF BLTSIZE : Last register initialised (it sets the blitter going). Set width bits equal to 2 always. Set height bits equal to DL. Hence value to use is (DL * 64) + 2. One final point. If the blitter is used to draw the closed border of a polygon, which is then to be filled by the blitter in boundary-fill mode as documented later, then the line should be drawn with only one pixel set per raster line. The blitter provides a line-drawing mode specially for this. To draw lines normally, set the SING bit in BLTCON1 to zero, and to draw lines in this special mode, set the SING bit to 1. The following code can be etched out of this file, and used at will. It is a drawline() routine complete with documentation which illustrates all of the above concepts in action. Note, in order to use this code, it is best to either kill off the operating system altogether or use Forbid() to ensure that your task is the only one running within the system. My test code using this kills off Exec, but I haven't included this in case anyone wishes to use this code and keep the operating system alive. * Blitter Line Drawing Code : Data Structures and Routines * Assumes that all rendering is done in a low resolution * screen of 4 bitplanes depth. * new line structure definition V2.0. Can be defined statically using * DC.W or generated by your program as required. Note that this code * allows multiple lines to be drawn one after the other, and even allows * mixing of SING and normal lines if wanted as well as lines of different * colours in sequence! rsreset line_screen rs.l 1 ;ptr to 1st screen bitplane ;(assumes continuity) line_ssize rs.w 1 ;size of 1 bitplane in bytes line_smod rs.w 1 ;screen modulo line_coords rs.l 1 ;ptr to line coord list ;workspace entries for drawline() routine line_deltax rs.w 1 line_deltay rs.w 1 line_2S_L rs.w 1 line_oct_bits rs.b 1 line_pad rs.b 1 line_sizeof rs.w 0 * line coord list structure rsreset lc_next rs.l 1 ;=ptr to next coord list entry, ;0 if last in list lc_x1 rs.w 1 lc_y1 rs.w 1 ;coords of start point lc_x2 rs.w 1 lc_y2 rs.w 1 ;coords of end point lc_pattern rs.w 1 ;should you want a dotted line... lc_bits rs.b 1 ;bitplanes & SING bit if wanted lc_pad rs.b 1 ;padding byte for alignment lc_sizeof rs.w 0 * lc_bits : 0-3 = colour (bitplanes in which drawn) * : 4 = SING bit for line draw ;drawline(a4,a5) a4 = ptr to line structure definition V2.0 ;a5 = ptr to custom chip registers ;draws line(s) according to the contents of the line definition ;structure(s). Note : line definition structure is a header structure ;containing workspace used by drawline() in this version. Actual ;coordinate and bitplane information etc., contained in separate list ;pointed to by an entry in the line definition structure. ;Core algorithm from Amiga System Programmer's Guide. Several ;addenda of my own for multiple bitplane handling, SING mode ;drawing etc. (SING mode needed for polygon drawing prior to ;polygon fill-see elsewhere for more info). ;d0-d7/a0-a3 corrupted drawline move.l line_coords(a4),a3 drawline_l0 moveq #0,d1 ;clear line octant selector move.w lc_x2(a3),d0 sub.w lc_x1(a3),d0 ;compute deltax = x2-x1 roxl.w #1,d1 ;condition octant selector tst.w d0 ;>0 or <0? bge.s drawline_b1 ;>=0 so skip neg.w d0 ;else absolute value drawline_b1 move.w d0,line_deltax(a4) move.w lc_y2(a3),d0 sub.w lc_y1(a3),d0 ;compute deltay = y2-y1 roxl.w #1,d1 ;condition octant selector tst.w d0 bge.s drawline_b2 neg.w d0 ;absolute value again drawline_b2 move.w d0,line_deltay(a4) move.w line_deltax(a4),d2 move.w d2,d3 sub.w d0,d3 ;want largest of Dx,Dy roxl.w #1,d1 ;condition octant selector tst.w d3 bge.s drawline_b3 exg d0,d2 ;ensure smallest of the ;two in d0 ;From here on, DS = Delta S, DL = Delta L, as in the book. DS = smallest ;of Dx, Dy, and DL = largest of Dx, Dy. I reuse the line_deltax(a4) ;entries in the structure for these, simply changing the order in ;which they appear if needed instead of having separate line_deltas() ;and line_deltal() entries. Trivial really. ;Some stuff is pre-calculated, and then saved in workspace entries ;provided in the line definition structure for this purpose. drawline_b3 movem.w d0/d2,line_deltax(a4) Dx = DS, Dy = DL lea octants(pc),a0 ;ptr to octant selector table add.w d1,a0 clr.w d1 move.b (a0),d1 ;get octant code move.b lc_bits(a3),d0 and.b #$10,d0 ;get SING bit lsr.b #3,d0 or.b d0,d1 ;put in eventual blitter ;control bits asl.w line_deltax(a4) ;2*DS move.w line_deltax(a4),d0 sub.w line_deltay(a4),d0 ;2*DS - DL bge.s drawline_b4 or.b #$40,d1 ;set SIGN bit if needed drawline_b4 move.b d1,line_oct_bits(a4) ;save BLTCON1 bits ;for later move.w d0,line_2S_L(a4) ;save 2*DS-DL move.l line_screen(a4),a0 ;screen pointer move.w d0,d1 sub.w line_deltay(a4),d1 ;2*DS - 2*DL move.w lc_y1(a3),d2 mulu line_smod(a4),d2 ;y1 * bitplane size move.w lc_x1(a3),d3 asr.w #4,d3 add.w d3,d3 ;2*int(x1/16) ext.l d3 add.l d3,d2 ;bitplane offset move.w line_deltax(a4),d3 ;2*DS moveq #0,d4 move.w lc_x1(a3),d4 and.w #$F,d4 ;frac(x1/16) ror.w #4,d4 ;create STARTx bits move.w d4,d5 swap d4 move.w d5,d4 ;copy to TEXTUREx bits or.b line_oct_bits(a4),d4 ;create BLTCON1 bits swap d4 or.w #$BCA,d4 ;create BLTCON0 bits swap d4 move.w line_deltay(a4),d5 ;get DL moveq #3,d7 ;no of bitplanes - 1 ;NB : trick here. Upper word of d7=0 after moveq #3,d7. Use this as ;the bitplane bit number, doing an addq.w #1,d7 each time, and using ;swap d7 to alternate between bitplane number counter & bitplane bit ;position counter. drawline_l1 swap d7 ;get bitplane bit number move.w d7,d6 ;ready for test addq.w #1,d7 ;next bitplane number swap d7 ;back to loop counter btst d6,lc_bits(a3) ;bitplane flag set? beq.s drawline_a1 ;no-get next drawline_b5 btst #6,DMACONR(a5) ;blitter ready? bne.s drawline_b5 move.l a0,a1 ;bitplane pointer add.l d2,a1 ;offset to 1st word of line move.w d0,BLTAPTL(a5) ;2*DS-DL move.w d1,BLTAMOD(a5) ;2*DS - 2*DL move.l a1,BLTCPTH(a5) move.l a1,BLTDPTH(a5) ;bitplane pointers proper move.w d3,BLTBMOD(a5) ;2*DS move.l #-1,BLTAFWM(a5) ;set masks move.w #$8000,BLTADAT(a5) ;1 bit must be set move.w line_smod(a4),BLTCMOD(a5) move.w line_smod(a4),BLTDMOD(a5) ;bitplane moduli! move.l d4,BLTCON0(a5) ;set blitter control regs! move.w lc_pattern(a3),BLTBDAT(a5) ;line pattern move.w d5,d6 lsl.w #6,d6 addq.w #2,d6 ;BLTSIZE = 64*DL+2 move.w d6,BLTSIZE(a5) ;draw line drawline_a1 add.w line_ssize(a4),a0 ;next bitplane pointer dbra d7,drawline_l1 move.l lc_next(a3),d0 ;check if more lines to do beq.s drawline_b6 ;none so exit move.l d0,a3 ;else set pointer bra drawline_l0 ;and do it drawline_b6 rts octants dc.b 4*4+1 dc.b 0*4+1 dc.b 6*4+1 dc.b 1*4+1 dc.b 5*4+1 dc.b 2*4+1 dc.b 7*4+1 dc.b 3*4+1 even Before documenting the boundary-fill mode of the blitter, some ideas for fut- ure experimentation include : using different minterms as described in the Amiga System Programmers' Guide, and tinkering directly with the SING bit to examine its effects. Also, the SIGN bit's effects can be examined, and the effect of using values other than $8000 in BLTADAT. I have not tried all of these, so exercise care. Some of the result could be very interesting assum- ing that the Amiga doesn't conk out under the strain... Boundary-filling:the blitter has a boundary-fill mode which makes the construction of filled polygons quite simple, once the vagaries of which registers have which values are dealt with. The blitter's boundary-fill operation is very simple-minded. When it fills a boundary, it recognises the boundary by virtue of the existence of a single pixel marking the boundary. Once it has found that single pixel, the blitter fills all blank pixels until it encounters another single pixel mark- ing the end of the boundary. More correctly, it uses the value of the FCI bit (the Fill Carry In bit) of BLTCON1 to determine what the value of filled pix- els should be. The algorithm is as follows:while a pixel equals zero, that pixel is replaced by the value of the FCI bit. Initially for normal fills, this is set to zero, ensuring that the initial fill leaves blank space around the bound- ary. Once a set pixel is encountered, it inverts the FCI bit, and then uses the ECE/ICE bits to determine what happens next. If the ICE bit (inclusive carry enable) bit is set, the pixel is set to the new value of FCI AFTER the inversion. If the ECE (exclusive carry enable) bit is set, the pixel is set to the value of the FCI bit BEFORE the inversion. With ECE set, it is possi- ble to obtain filled polygons with single-pixel corners, whereas corners of filled polygons using ICE will always have at least two pixels at any corner formed by the intersection of two boundary lines forming a peak (such as the apex of a triangle). This makes most sense when I mention that the blitter fills horizontally from right to left, a word at a time, until it has exhaus- ted the row, and then moves on to the next row. Note also that if during the fill of one raster line of pixels, it fails to encounter a boundary pixel, the blitter will continue the fill on the next raster line until encounter- ing a boundary pixel, resulting in weird and wonderful (but not always desi- rable!) effects. Furthermore, the algorithm works only in DESCENDING mode, so that the blitter's DESC bit must be set. So, to fill a boundary, the procedure is as follows:draw the poly- gon boundary using the SING mode of the blitter's line-drawing function, so that any lines with a slope of less than 45 degrees have only one pixel per horizontal raster line. Having drawn the closed boundary in a suitable mem- ory buffer, point the BLTxPTH/L registers at the END of the memory buffer, and activate the blitter fill. This is fast-16 million pixels per second peak speed. To illustrate the procedure best of all, I present a piece of code that performs this function. This code can be freely ripped out of this DOC file and mutilated ad lib to suit the programmer's personal prejudices (note the use of alliteration! I passed my English Language O-Level! HAH! So what I hear you say...) This code uses the line drawing code above to draw the boundary in a memory buffer, then fills the buffer before transferring the contents of the buffer to the screen. It uses a data structure, which I also include in this section, to manage the polygon. Note that I refer to something called the Laurence trick for managing the blitter. This refers to a trick devised by a colleague of mine, a Belgian programmer called Laurence Vanhelsuwe who first used the trick of setting the modulo to -2 for blitter data block move- ments. I refer to it frequently in my blitter routines. * polygon definition structure. See draw_polygon() routine * for more info. rsreset poly_screen rs.l 1 ;1st bitplane to draw polygon on poly_ssize rs.w 1 ;bitplane size poly_smod rs.w 1 ;bitplane width in bytes poly_buffer rs.l 1 ;where to draw polygon before rendering ;on the real screen... poly_wide rs.w 1 ;width of buffer in words poly_tall rs.w 1 ;height in raster lines poly_border rs.l 1 ;pointer to line definition structure poly_xpos rs.w 1 ;position to plot finished polygon poly_ypos rs.w 1 ;on the screen poly_flag rs.b 1 ;colours poly_pad rs.b 1 poly_sizeof rs.w 0 * poly_flag : bits 0-3 = colour * poly_border : points to line definition structure. This in turn has * a pointer to a line coord list, each entry of which should have the * SING bit set in lc_bits. ;draw_polygon(a4,a5) ;a4 = ptr to polygon structure definition block ;a5 = ptr to custom chip registers ;creates polygon using blitter line draw in SING mode into a buffer ;followed by blitter fill. Then transfers the whole lot to the screen ;at the desired polygon coordinates. I.e. pre-plots in buffer. Only ;pre-plots in one bitplane, for max efficiency, then moves the entire ;lot over to the actual screen, copying only to those bitplanes ;required. Obviously, if plotting in colour 9 over a background with ;pixel data in colours 2 & 6, these will show up as colour 11 & 15 ;pixel data (for a 4-bitplane screen). ;note:for optimum efficiency of memory usage, define your polygon ;with one corner at (0,0) or as close as possible to it. ;d0-d7/a0-a3 corrupt draw_polygon move.l poly_border(a4),a3 move.l line_coords(a3),d0 ;get coord list beq draw_poly_done ;doesn't exist-BYE! move.l d0,a3 moveq #0,d0 ;potential x & y coord moveq #0,d1 ;maxima draw_poly_l1 move.w lc_x1(a3),d2 move.w lc_y1(a3),d3 cmp.w d2,d0 bge.s draw_poly_b1 move.w d2,d0 ;new x maximum draw_poly_b1 cmp.w d3,d1 bge.s draw_poly_b2 move.w d3,d1 ;new y maximum draw_poly_b2 move.w lc_x2(a3),d2 move.w lc_y2(a3),d3 cmp.w d2,d0 bge.s draw_poly_b3 move.w d2,d0 ;new x maximum draw_poly_b3 cmp.w d3,d1 bge.s draw_poly_b4 move.w d3,d1 ;new y maximum draw_poly_b4 move.l lc_next(a3),a3 ;find next set of coords move.l a3,d2 ;check if they exist bne.s draw_poly_l1 ;yes, back for more testing ;this for debug only move.w d0,debug move.w d1,debug+2 ;Use max x&y coord values to determine size of buffer to clear ;using the blitter. move.w d0,d2 lsr.w #4,d0 ;int(max_x/16) = word count and.w #%1111,d2 ;check fraction beq.s draw_poly_b5 addq.w #1,d0 ;1 word more draw_poly_b5 move.w d0,poly_wide(a4) ;WORD count! move.w d1,poly_tall(a4) ;height in raster lines ;prepare to clear buffer in which polygon is to be drawn. ;Clear using blitter - it's most efficient! move.w d1,d6 lsl.w #6,d6 add.w d0,d6 ;this is BLTSIZE move.w d6,debug+4 clr.w debug+6 move.w #$01F0,BLTCON0(a5) ;USED only ;minterms $F0 clr.w BLTCON1(a5) ;normal mode, no shift move.l poly_buffer(a4),BLTDPTH(a5) ;ptr to buffer ;to clear clr.w BLTDMOD(a5) ;D modulo zero clr.w BLTADAT(a5) ;A data zero for clear moveq #-1,d2 move.l d2,BLTAFWM(a5) ;ensure masks allow data passage move.w d6,BLTSIZE(a5) ;and clear buffer! move.l poly_border(a4),a3 ;ptr to coord ;struct for border add.w d0,d0 ;WORD count to BYTE count move.w d0,line_smod(a3) mulu d0,d1 ;size of buffer in bytes move.w d1,line_ssize(a3) ;won't be a long really! move.l poly_buffer(a4),d2 move.l d2,line_screen(a3) move.l a4,-(sp) move.l a3,a4 ;point to line structure bsr drawline ;& draw the lines move.l (sp)+,a4 ;recover pointer ;now activate blitter fill. Note : that works only if descending mode ;selected. This code does that. This is an exclusive fill enable type ;fill, with FCI initially zero (non-inverting fill). draw_poly_b6 btst #6,DMACONR(a5) ;wait till blitter done bne.s draw_poly_b6 ;busy wait (sigh) move.l poly_buffer(a4),a0 ;where border is move.w poly_wide(a4),d1 ;width in words move.w d1,d2 move.w poly_tall(a4),d3 mulu d3,d1 ;area size in words add.l d1,d1 ;area size in bytes add.l d1,a0 ;descending mode- ;adjust pointer subq.l #2,a0 ;point to last word ;of data proper! move.l a0,BLTDPTH(a5) move.l a0,BLTAPTH(a5) ;two pointers clr.w BLTAMOD(a5) clr.w BLTDMOD(a5) ;both moduli zero! moveq #-1,d0 move.l d0,BLTAFWM(a5) ;ensure masks OK move.w #$09f0,BLTCON0(a5) ;USEA/D, minterms $F0 ; move.w #$0012,BLTCON1(a5) ;EFE on, FCI=0, DESC=1 move.w #$000A,BLTCON1(a5) ;IFE on, FCI=0, DESC=1 move.w d3,d0 ;height lsl.w #6,d0 ;*64 add.w d2,d0 ;add on width in words move.w d0,BLTSIZE(a5) ;start blitter ;NB : when computing pointers for data blocks in above section, ;use subq.l #2,a0 to point to last words proper, instead of beyond ;the data blocks, otherwise the fill gets confused! Weird things ;happen if you don't do this! move.l poly_screen(a4),a0 ;screen pointer move.l poly_buffer(a4),a2 ;ptr to polygon buffer moveq #0,d0 move.w #BP_BYTES,d5 ;no of bytes per screen bitplane move.w poly_ypos(a4),d0 mulu poly_smod(a4),d0 ;y coordinate * screen modulo add.l d0,a0 move.w poly_xpos(a4),d0 move.w d0,d4 ;save x coordinate lsr.w #4,d0 ;2*int(x/16) add.w d0,d0 add.w d0,a0 ;now this is initial pointer move.w poly_smod(a4),d2 ;screen modulo move.w poly_wide(a4),d3 addq.w #1,d3 ;word cols + 1:Laurence trick add.w d3,d3 ;Part 2 sub.w d3,d2 move.w d2,d0 ;C,D mods in d0 swap d0 move.w #-2,d0 ;A,B mods also moveq #-1,d1 ;blitter mask values:Laurence clr.w d1 ;Trick Part 3 move.w d4,d2 ;get x coordinate and.w #%1111,d2 ;frac(x/16) ror.w #4,d2 ;put in top 4 bits for BLTCONx move.w d2,d3 or.w #$0FCA,d2 ;USEA/B/C/D, minterms $CA swap d2 move.w d3,d2 ;create BLTCONx bits moveq #0,d3 move.w poly_tall(a4),d3 and.w #$3FF,d3 lsl.w #6,d3 move.w poly_wide(a4),d6 addq.w #1,d6 and.w #$3f,d6 add.w d6,d3 ;this is BLTSIZE!! moveq #3,d7 ;no of bitplanes ;here transfer polygon to screen bitplanes according to colour ;specifier. a0 = ptr to screen location, d0 = moduli (C,D high ;word, A,B low word), d1 = blitter mask values, d2 = BLTCON0 ;and BLTCON1 control words, d5 = size of 1 screen bitplane in ;bytes, d3 = BLTSIZE value pre-calculated (will stay the same ;size throughout the operation) and a2 = ptr to polygon buffer. ;So leave d0-d5/a0/a2 alone while within the loop!. Leave a4 alone ;anyway or the routine will crash! Freely alter d6/a1/a3. This ;info in case you have any flash refinements to make. Note one of my ;favourite tricks-using SWAP & sticking counters/other data in both ;words of 1 reg. ;Note that the choice of sources A and B dosen't matter here because ;they're both the same! If you use different sources, remember to ;make Source A the MASK, Source B the DATA, Source C the BACKGROUND! draw_poly_l2 swap d7 ;bitplane bit no move.w d7,d6 ;copy addq.w #1,d7 ;next bitplane no swap d7 ;back to counter btst d6,poly_flag(a4) ;this bitplane? beq draw_poly_b8 draw_poly_b7 btst #6,DMACONR(a5) ;wait till blitter done bne.s draw_poly_b7 ;busy wait (sigh) move.l a0,BLTDPTH(a5) ;ptr to screen area to plot to move.l a0,BLTCPTH(a5) move.l a2,BLTAPTH(a5) ;ptr to polygon buffer move.l a2,BLTBPTH(a5) move.w d0,BLTAMOD(a5) ;A,B moduli -2:Laurence trick move.w d0,BLTBMOD(a5) ;Part 1 swap d0 ;get C,D moduli move.w d0,BLTDMOD(a5) ;C,D moduli move.w d0,BLTCMOD(a5) swap d0 ;recover A,B moduli again move.l d1,BLTAFWM(a5) ;blitter masks (see above) move.l d2,BLTCON0(a5) move.w d3,BLTSIZE(a5) ;start blitter draw_poly_b8 add.w d5,a0 ;next bitplane dbra d7,draw_poly_l2 ;continue draw_poly_done rts Hardware:Sound Management For those who like the Jean-Michel Jarre sound effects that the Amiga is able to reproduce (guess who's bought all of his albums!) and wish to reproduce a similar gee-whizz set of sound effects, this is the section for you. However, I warn anyone who hasn't studied maths to a decent level of the horrors to come. Sound synthesis using the additive method of the Paula chip is best understood by those who know something about Fourier analysis! I shall try to make this as painless as possible. All sound waves, plotted graphically, are formed from lots of sine (and/or cosine) curves added together. The fun part about adding sine and co- sine functions together is that you get a more complex waveform as a result. This is known by the lofty title of the principle of superposition of wave- forms. So any waveform can be broken down into components of the form A * sin ((n * x) + p)) where A is the amplitude of the component (which equates approximately to the volume), n is a constant multiplier from 1 onwards, and p is the phase angle. The phase angle describes how far along the x-axis the curve is shifted. As it happens to be true that cos(x) = sin(x+w) where w=90 degrees or pi/2 rad- ians, all components of a sound wave can be represented as above. As an exercise, to see this in action, try plotting a graph of the function sin(x) - (1/3)sin(3x) + (1/5)sin(5x) - ... for as many terms as you can bother to calculate. You'll find the waveform is interesting. This leads directly on to some musical definitions, and their rela- ted mathematical definitions. The amplitude of the curve, namely the distance between the highest peak and the lowest trough of the curve, defines the vol- ume of the sound. The pitch is related to the frequency, which in turn is in direct relation to the constant n in the first expression above. Where one of the components of the curve has a large constant multiplier in front of it, i.e., the value of A is large, that component is the dominant one and defines the pitch of the note. This is the principal note. The lesser components are known as the harmonics, especially when they are in some numeric relation to the principal note. In the second expression above, the second and third com- ponents define harmonics for the note, the principal is the term sin(x), and the amplitude of the whole is the distance between the peaks & troughs, which for this waveform should be 2*(1-1/3+1/5-...). Anyone now worrying about the amplitude of this being infinite, because of it being an infinite series, be reassured. The series may be infinite, but for good mathematical reasons it ends up as a finite value! Trust me-I did this at university! Anyone wishing to construct a waveform using this idea, be warned. The ultimate result of embarking on this course is to immerse oneself in the varagries of Fourier analysis, so called because Joseph Fourier, the French mahtematician, first published a treatise on the subject (which he became in- terested in precisely because he wished to analyse music mathematically). If you carry this through, you'll find yourself immersed in masses of trigono- metrical integration and orthogonality computations, and if you don't know what that means, you're best avoiding this method! Fortunately, there are other ways of creating your waveform. One can simply draw a nice-looking graph of what looks like a waveform, and instead of working out the x,y values the hard way as above, simply read them off the graph. It is these values that Paula uses! Now, it is time to discuss other effects. The frequency of the wave need not be constant. Varying the frequency rapidly about the principal note by a small amount creates vibrato. Slowing the rate of variation gives rise to tremolo. The shape of the curve above defines a quality called timbre, and this is the reason that so many musical instruments sound differently even if they all play the same note. Their sounds, when taken via a microphone, sent through an amplifier and displayed on an oscilloscope screen, give rise to a whole variety of different curve shapes. It was these shapes that led Fourier to perform his mathematics which describes them. And yes, Paula can reproduce all of these! By way of example, a square wave has the equation y = sin(x) + (1/3)sin(3x) + (1/5)sin(5x) + ... which is slightly different to the first example. A sawtooth wave has the equation y = sin(x) + (1/2)sin(2x) + (1/3)sin(3x) + ... and for your education, anyone with a Yamaha DX7 synthesiser will know that this instrument builds its sounds in precisely this way (which is why it is so hard to play one!). Noise, as opposed to music, is defined as randomly superposed fre- quencies. White noise, as it is called, is a mathematical idealisation that is impossible to achieve in practice, and consists of all possible frequen- cies superposed one upon another. Pink noise is a more limited selection of such frequencies played simultaneously, and good pink noise generators are capable of generating all audible frequencies simultaneously, thus provid- ing a good approximation to white noise. Noise curves, when plotted, look like the paths described by spiders walking after immersion in vodka. Anyone who has seen sound traces on oscilloscopes will know what I mean. Having finished the preamble, it is now time to consider how to im- plement this on Paula. First, let us look at the register list for Paula: Offset Name Function ------ ---- -------- ADKCON 09E Audio & Disc Controller (Cross-reference #5) AUD0LCH 0A0 High word, audio data address, channel 0 AUD0LCL 0A2 Low word, audio data address, channel 0 AUD0LEN 0A4 Data length, channel 0 AUD0PER 0A6 Period duration, channel 0 AUD0VOL 0A8 Volume, channel 0 AUD0DAT 0AA Audio data, channel 0 (to D/A converter) AUD1LCH 0B0 High word, audio data address, channel 0 AUD1LCL 0B2 Low word, audio data address, channel 0 AUD1LEN 0B4 Data length, channel 0 AUD1PER 0B6 Period duration, channel 0 AUD1VOL 0B8 Volume, channel 0 AUD1DAT 0BA Audio data, channel 0 (to D/A converter) AUD2LCH 0C0 High word, audio data address, channel 0 AUD2LCL 0C2 Low word, audio data address, channel 0 AUD2LEN 0C4 Data length, channel 0 AUD2PER 0C6 Period duration, channel 0 AUD2VOL 0C8 Volume, channel 0 AUD2DAT 0CA Audio data, channel 0 (to D/A converter) AUD3LCH 0D0 High word, audio data address, channel 0 AUD3LCL 0D2 Low word, audio data address, channel 0 AUD3LEN 0D4 Data length, channel 0 AUD3PER 0D6 Period duration, channel 0 AUD3VOL 0D8 Volume, channel 0 AUD3DAT 0DA Audio data, channel 0 (to D/A converter) Complications arise because ADKCON controls the disc and UART as well as the sound output. However, ADKCON works on the SETIT principle in an identical fashion to DMACON and other like control registers, so that the programmer can choose to set only those bits pertinent to the sound system. Only the low 8 bits are used for sound, and bit 15 acts as the SETIT bit (#5) as for DMA- CON. The bit assignments for sound are: Bit Name Function --- ---- -------- 15 SETIT see DMACON 7 USE3PN Audio channel 3 modulates nothing 6 USE2P3 Channel 2 modulates period of channel 3 5 USE1P2 Channel 1 modulates period of channel 2 4 USE0P1 Channel 0 modulates period of channel 1 3 USE3VN Channel 3 modulates nothing 2 USE2V3 Channel 2 modulates volume of channel 3 1 USE1V2 Channel 1 modulates volume of channel 2 0 USE0V1 Channel 0 modulates volume of channel 1 These usages will be explained later. There exist two possible ways of creating a sound. First, one could draw a graph of one wave of the waveform, digitise it (i.e., read off the y values at equally spaced x intervals, and rescale these y values to that they lie within the range -127 to +127) and put the data into CHIP RAM. Then, set Paula up to play the waveform, and write a piece of interrupt code to respond to an audio interrupt by replaying the waveform repeatedly. This limits the note to one waveform, but the pitch can be varied within the interrupt code. The second method is to digitise an ENTIRE sound into CHIP RAM, and set Paula up to play the whole lot in one go. This is obviously a task that is not suited to hand-calculation, and to assist with this, special digitiser hardware exists that can be interfaced to the Amiga to perform this function, this hardware being known as a SAMPLER. By using a sampler, an entire single record can, in theory, be digitised and played on the Amiga (for an example of what is capable, listen to the Xenon 2 Megablast soundtrack by Bomb The Bass!). Problems occur because of the use of 8-bit data resolution within the sound system. A certain quality can be achieved, but for comparison, a compact disc player uses 16 bits for hi-fi quality. So anyone toying with the idea of reproducing studio quality classical music on the Amiga directly will possibly be disappointed-there may be a noticeable change in quality. Because all analogue to digital conversion involves sampling errors, due to the roun- ding effect of A/D conversion, a quantity called the quantization error res- ults. For an 8-bit system, the maximum possible quantization error is 1/256 of the digitised value. The magnitude of the quantization error is directly proportional to a noise value called the quantization noise, which may have a deleterious effect upon sound quality (hence the use of 16 bits by CD players which reduces the quantization error to 1/65536 of the digitised value). Now, the first quantity that requires calculation is the sampling rate. This is the rate at which the sampler 'snapshots' the analogue sound data and converts it into digital data. If the data is not sent to Paula at the same rate, then the sound will be shifted in frequency with the possi- bility of distortion occurring (but of course, there is no reason why this cannot be done deliberately for special effects!). If using one waveform, the number of samples per waveform is equal to the number of y values you have decided to use. To make sine curves con- form more closely to the ideal, increase the number of samples. A simple example illustrates this. Let us reproduce a pure tone, as represented by a pure sine wave. One complete wave consists of the curve from 0 degrees to 360 degrees. Let us split this up into 16 samples. So we require the values of sin(x) from 0 to 360 degrees in 16 steps, scaled so that the end results lie within the range -127 to +127. Because sin(x) varies from -1 to +1, we therefore want the values of 127*sin(x) over this range. The values thus required are: 0 48.6007 89.8025 117.3327 127 117.3327 89.8025 48.6007 0 -48.6007 -89.8025 -117.3327 -127 -117.3327 -89.8025 -48.6007 Rounding these to the nearest whole number (which introduces the quantization error unfortunately, and cannot be avoided because Paula can't handle float- ing point numbers!) and putting this list of bytes into CHIP RAM, yields the desired sample data. Since Paula uses standard twos-complement bytes, and it is usual for assemblers to generate these directly from negative numbers, we can insert these values directly into a DC.B list, e.g.; sine_wave dc.b 0,49,90,117 dc.b 127,117,90,49 dc.b 0,-49,-90,-117 dc.b -127,-117,-90,-49 Now, we point Paula's data registers at the start of the list. This is usually done using something like LEA $DFF000,A5 LEA sine_wave(PC),A0 MOVE.L A0,AUD0LCH(A5) Now we tell Paula how long the sample list is in WORDS. There are 16 actual data bytes. The value to write to AUD0LEN is thus 16/2 = 8. For lists with an odd number of bytes, e.g., 17 instead of 16, use 17/2 rounded DOWN to the nearest integer, i.e., 8 again. Use the instruction: MOVE.W #8,AUD0LEN(A5) to set the list length. For longer lists, use two labels, one at the start of the list, one at the end of the list, and use something like MOVE.W #(end-start)/2,AUD0LEN(A5) (as in the Amiga Systems Programmer's Guide). Now we select the output volume. Since this is going to be a cons- tant value, and we can choose any value from 0 to 64, let us choose half of full volume: MOVE.W #32,AUD0VOL(A5) Now we need to choose our sampling rate. In this case, the sampling rate will affect the frequency of the sound. So we can use the relation F = S/N where F is the frequency, S is the sampling rate, and N is the number of sam- ples per cycle. In this case, N=16. Now we cannot specify the frequency as a specific number of Hertz, but need to relate it to the number of bus cycles. A bus cycle is 279.365 nanoseconds. We need to compute the sample period as given by the relation P = 1/(S * 2.79365 * 10E-7) where I use 10E-7 to represent 10 to the power of -7. To play our sine wave at International A, or 440 Hz, we need to transform our frequency relation, and yield S = F * N Using F=440, N=16, we have S = 7040 Hz. Now this will give P as the value 1/(7040 * 2.79365 * 10E-7) or 508.4583. Rounding, this yields 508. We therefore set up Paula using MOVE.W #508,AUD0PER(A5) to generate the appropriate International A note. The resulting note will in actuality deviate by 0.4 Hz from the true value, but only a listener posses- sing perfect pitch (particularly a classical musician) will notice the diff- erence. There exists some limit upon the AUDxPER values. Each audio channel has one DMA slot per raster line, and hence one word, or two samples, can be read in one raster line. Thus the smallest possible value for the sampling period is 114 (NOT 124 as given in the Amiga Systems Programmer's Gude!). The value is obtained as follows:one raster line equals 63.5 microseconds. So one second contains 1/(63.5 * 10E-6) raster lines, which rounded up equals 15748 raster lines. Two samples can be read per raster line, or 31496 samples when read at maximum speed. This yields P = 113.65, or rounded up, 114. Selecting a value lower than 114 will result in some data words being output twice, as the DMA system cannot fetch the next data word on time. This gives a sampling rate of 1/(114 * 2.79356 * 10E-7) or 31399 Hz. Obviously, one can sample at a slower rate, and even sample using P=65535 for weird effects! Having set up the registers, we need to activate the audio DMA. The instruction MOVE.W #$8201,DMACON(A5) performs this (it sets AUD0EN and DMAEN just to be sure). At this point, Paula will play the sound. The example above will be heard as a single note playing continuously, and eventually you will become so sick of hearing it you'll turn the volume off on your monitor! This exam- ple sounds like the horrible 'beep' accompanying the BBC test card... Now for some details. Paula has internal registers for data fetching unlike the blitter (why the hell couldn't the blitter have them?), and when Paula is started up, the AUDxLCH/L register pair contents are copied to these internal registers before data fetching. Thus the initial values of AUDxLCH/L are preserved. The same is true for AUDxLEN, AUDxVOL and AUDxPER (otherwise you wouldn't hear the same 'beep' all the time when playing the example sound above!). Only when the audio DMA is turned off will the sound stop! Now since the AUDxCLH/L values, etc, are copied to internal regis- ters while playing, the processor can supply new values to change the note on a continual basis. Uninterrupted sound generation can thus continue. Paula generates an interrupt at the end of each complete sound out- put cycle. By enabling the audio interrupt bits, and linking in an interrupt handler to the audio interrupt vector (IPL4) it is possible to change which note is played under interrupt control. Be warned that for high frequencies the interrupt occurs VERY OFTEN and if your interrupt code cannot respond to it fast enough, the supervisor stack will overflow with unsatisfied inter- rupt requests and crash the machine! As for all interrupt code, the interrupt handler must clear the INTREQ bit for the appropriate channel after checking which channel caused the interrupt to occur, and then perform whatever function the user desires, be it changing the frequency of a note such as played in the above example, or performing more complex changes such as switching between waveforms. Modulation:the term modulation is used to describe the process by which one waveform can be used to change the output from another. Normally two quantities are subject to modulation, volume and frequency. When a waveform A is used to alter the volume at which waveform B is output, waveform A is described as a VOLUME ENVELOPE. When the waveform A is used to alter the pitch of the notes played, waveform A is then described as a TONE ENVELOPE. Volume envelopes are the simplest to describe. With a volume envel- ope, it is possible to generate full ADSR synthesised sound. ADSR stands for Attack, Decay, Sustain, Release, and describes four steps in the generation of a sound. Attack describes how a sound builds up in volume from zero to a maximum. Sustain describes the section where the volume is maintained at a constant volume for a specific time period. Decay and Release both describe how sound volume decreases, the Decay phase(s) following Attack/Sustain pha- ses, and the Release phase occurring at the end of the note. When using a volume envelope, the data for the volume envelope is entered as for normal sample data. A second area of memory is used for the sample data itself. For example, let channel 0 modulate channel 1. For this process, AUD0VOL is turned off (channel 0 is not used for actual sound gene- ration), AUD0LCH/L is pointed to the envelope data, and AUD0PER can be set either to the same rate as AUD1PER for the actual sound data, or to a diff- erent value, according to the effect desired. AUD1LCH/L is pointed to the sample data, AUD1VOL is set to the maximum value (64), and AUD1PER to the sampling rate pertinent for the sample data. Lastly, set USE0V1 in ADKCON and set Paula running. A frequency envelope is defined in a different way. Instead of defi- ning it as a list of sample data bytes, it is defined as a list of sampling rate words as written to the AUDxPER registers. If the base note of the sound has a principal frequency of 440 Hz as in the example above, and thus has a sampling period value of 508, the values in the frequency envelope can vary around this value to create vibrato/tremolo effects, or alternatively, vary in one direction to produce ascending/descending pitch. The larger the value used, the lower the sampling rate used and the lower the frequency resulting when the sample is played. Depending upon the rate of sampling used for the frequency envelope, either vibrato or tremolo can be produced. Note that it may be possible to set both USE0V1 and USE0P1 bits. If this is done, the data for channel 0 will change BOTH the volume and the fre- quency of the output from channel 1, but since there is a fundamental differ- ence between amplitude and volume envelope data, the effects cannot readily be predicted beforehand. Sound generation problems:in the discourse above, it was mentioned that the maximum sampling rate possible was 31399 Hz (ignore the value used in the Amiga System Programmer's Guide-it is WRONG). For a digitised sine wave of 16 sample data points, the maximum frequency possible for this sine wave is 1962.4 Hz. To generate higher frequencies, the number of samples must be reduced, so that, for example, a sine curve with 8 samples will have a maximum possible reproduction frequency of 3924.8 Hz. But as the number of samples decreases, the quality of the sine wave also decreases, until in the limiting case, the result is an annoying buzzing noise. One way of circumventing the problem is to digitise multiple waves in the waveform instead of just one. This allows higher-frequency waveforms to be generated without the problems of waveform degeneration. But a second problem occurs with the higher frequencies. There exists a phenomenon assoc- iated with high-frequency sound generation called aliasing distortion. This comes in two forms, one as the sum of the sampling rate and the desired fre- quency, and the second as the difference. So, for example, if the sound is of 3KHz frequency, and the sampling rate is 12KHz, the aliasing distortion will occur at 9KHz and 15KHz. To eliminate this aliasing distortion, Paula contains a device known as a low-pass filter. This has been placed between the output of the D/A con- verters and the audio connectors in Paula, and its effect is to allow all of the frequencies below 4KHz to pass undisturbed. Frequencies between 4KHz and 7KHz are diminished in amplitude, and frequencies above 7KHz are not passed at all. In the example cited above, the 3KHz main signal is allowed to pass to the speaker undisturbed, but both of the aliasing distortion frequencies lie above the 7KHz cutoff point and are eliminated. Should the sampling rate be reduced to 9KHz, however, the aliasing distortions now occur at 12KHz and 6KHz, and the 6KHz aliasing distortion, although diminished, is still allowed past to the speaker. So, when determining sampling rates, to ensure destruction of the aliasing distortion and passage of the desired sound signal, the sampling rate must be chosen so that it is greater than the frequency of the highest frequency component of the sound PLUS the 7KHz cutoff point. So for a sound with a maximum frequency component of 4KHz, the sampling rate must be greater than 4KHz+7KHz, or 11KHz. Needless to say, given the upper limit of 31KHz on the sampling rate as determined earlier, the highest possible frequency com- ponent of ANY sound can be no higher than 24KHz. Many people lack the ability to hear sounds above 16KHz, but some exceptional persons can hear up to 24KHz before the sound exceeds their audible range, including one or two classical musicians of note. Most music does not contain notes above 3KHz, so very high sound frequencies are generally for specialist applications. Fundamental notes are not the only consideration, however. To main- tain the quality of the music, the timbre must be maintained. This is direct- ly related to the shape of the curve of the waveform, which in turn depends upon maintaining the existence of higher frequency harmonics. Harmonics are often separated by octaves, and an increase of one octave results in the fre- quency being doubled (octaves and frequencies are related logarithmically). So a 3KHz note with several harmonics may have most or all of its harmonics destroyed by the low-pass filter. In the extreme case, a square wave which consists theoretically of an infinite number of harmonics may be reduced to a sine wave by the low-pass filter. Of course, digitising whole sections of music as opposed to single notes solves this problem, but at the expense of memory. If the sampling rate used by a hardware sampler is 20KHz, and one second of music is sampled, the data takes up 20K of memory. This limits the maximum amount of sound that can be digitised in one go at that sampling rate to 25.6 seconds (assuming that you can fill all of CHIP RAM with the data!). For lower sampling rates more data can be digitised, but the memory problem still remains. One solution is to digitise sections at a time, compress the resulting data using a suitable data compression a;lgorithm, then decompress them into CHIP RAM and play them in sequence. Needless to say, the decompression algorithm needs to be fast, and synchronised to the sound playing. The data must not be decompressed into an area of memory already being used for sound reproduction, hence the decom- pression of data MUST NOT catch up with the sound generation! Grabbing the data from disc is another possible way around the problem-directly accessing the disc tracks yields a data transfer rate of 62500 bytes per second peak, and can hence be used to lob in data piecemeal, provided that there is suffi- cient disc space. Again, compression/decompression may be needed. Hardware:Disc Management Disc drives on the Amiga are controlled from two sources, the CIAs and Paula. The CIAs govern such features as motor control, drive selection (#6) and head movement. The FLAG pin of CIA-B is connected to generate the /INDEX signal of the disc drive. The ADKCON register mentioned above in the sound management section is also used to control the disc controller. The bits 14-8 of this register are used for disc control, and the bit assignments are: Bit Name Function --- ---- -------- 15 SETIT see DMACON 14,13 PRECOMP Sets precompensation 12 MFMPREC 0=GCR encoding, 1=MFM encoding 11 UARTBRK used for the UART, not used here 10 WORDSYNC 1=enable synchronisation 9 MSBSYNC 1=enable GCR synchronisation 8 FAST 0=4 microseconds/bit(GCR) 1=2 microseconds/bit(MFM) By appropriate programming of this register, it is possible to enable Amiga disc drives to read either MFM or GCR encoded discs. For explanation of the terms MFM and GCR, see a comprehensive data source on disc drive operation, as I have temporarily forgotten the relevant information. Both encoding mechanisms require appropriate software to generate the encoded data before writing to disc, and to decode the data read from the disc. The appropriate routines for data encoding/decoding for MFM form part of the trackdisk.device - the Amiga uses MFM encoding for its discs as the default choice. The need for such encoding is explained simply - raw data cannot be written as is to disc. Because of limitations imposed by, among other things, the laws of electromagnetism, the data has to be encoded in a manner allowing the data to be securely stored in the form of magnetic flux changes on the disc surface. I suggest searching out a concise reference work on the subject before attempting one's own MFM/GCR encoding/decoding software! As to the existence of these two systems, they have their own hist- ory. However, GCR coding is used for Apple Macintosh discs, and there exists a Macintosh emulator for the Amiga called A-MAX II using Paula's GCR encoding ability to read Mac discs directly. Precompensation is a little bit more difficult to explain. When the disc system writes data to the disc, the data is written as a series of flux changes onto the magnetic medium. The time for each flux change to occur is called the half-zero-bit length, and the gap synthetically introduced to in- crease data security for high-speed transfers is the precompensation. Normal- ly the faster the data transfer, the higher the precompensation needed to av- oid data read/write errors. Four possible settings are provided for by Paula. Now we have the ability to set up the disc controller to provide the encoding system, precompensation and disc controller clock rate. Now we need to tell Paula where the disc transfer buffer lies. This involves the regis- ters DKSPTH (offset $020) and DKSPTL (offset $022). These two registers pro- vide a pointer into a buffer in CHIP RAM. This buffer can be used for either read or write operations. Having informed Paula of the disc transfer buffer location, we need to inform Paula of 1) the length of data to transfer, and 2) the data direc- tion (read/write). The register DSKLEN (offset $024) performs these functions in one go. The bit assignments of DSKLEN are: Bit Name Function --- ---- -------- 15 DMAEN Enable Disc DMA 14 WRITE 0=read data from disc, 1=write data to disc 13-0 LENGTH 14-bit number : no of WORDS to transfer When DMAEN is set to 1, the data transfer is theoretically enabled. The use of the word 'theoretically' is deliberate, because Paula contains a mechanism to prevent accidental disc writes. Firstly, the DSKEN bit in DMACON must be set, and even if it is set, the DMAEN bit of DSKLEN (CARE! confusion may ar- ise in the names here!) must be set TWICE before the operation is executed. In addition, the WRITE bit must only be set for a genuine write operation! If you try changing the value of this bit during the second setting of WRITE, a weird and wonderful sequence of events may just occur, leading among other things to Paula shuffling off her mortal coil... The orderly sequence for DSKLEN is as follows: MOVE.W #0,DKSLEN(A5) ;turn off disc MOVE.W #$8210,DMACON(A5) ;enable disc DMA ;just in case LEA disbuf(PC),A0 MOVE.L A0,DSKPTH(A5) ;set up disc buffer CLR.W D0 BSET #15,D0 ;set DMAEN BSET #14,D0 ;set WRITE if wanted MOVE.W #LENGTH,D1 ;amount of data ;to transfer ADD.W D0,D1 MOVE.W D1,DSKLEN(A5) ;set up disc MOVE.W D1,DSKLEN(A5) ;now execute! ... ;here wait until the ;disc DMA is finished... MOVE.W #0,DSKLEN(A5) ;and shut off when done. The DSKBLK interrupt is provided in the INTREQ/INTENA registers so that the processor can discover when the disc controller has finished. When the number of words specified in DSKLEN has been transferred in whichever direction has been chosen, the DSKBLK interrupt is signalled. It is generated when the last word of data is transferred. To examine the current status of the disc controller, there exists the DSKBYTR (offset $01A) register, which is assigned as follows: Bit Name Function --- ---- -------- 15 BYTEREADY Signals that byte in lower 8 bits is valid 14 DMAON Indicates if disc DMA is active. DMAON set to 1 when DMAEN of DSKLEN is 1 AND DSKEN of DMACON is 1 (CARE WITH NAMES!) 13 DSKWRITE Copy of WRITE in DSKLEN 12 WORDEQUAL Disc data equals DSKSYNC 11-8 Unused 7-0 DATA Current data byte from disc Incidentally, it is possible to use this data register to read the data from the disc using the 68000 intsead of using DMA (should you want to!). Whenever a complete byte is received from the disc, the disc controller sets the BYTE- READY bit. The processor then knows that the data in the lower 8 bits is a valid data byte. After DSKBYTR is read, the BYTEREADY flag is automatically reset. The DMA <system normally performs this without intervention from the 68000. Sometimes, instead of reading an entire track of data at once, the programmer may wish to read data starting at a specific position. The prog- rammer uses the DSKSYNC register (offset $07E) to determine where the disc data transfer will begin. The value is an offset, indicating which data word the transfer is to begin at (for normal whole-track transfers, the value of DSKSYNC is zero). The disc controller maintains a count of words transferred, and when that count is less than the value in DSKSYNC, the data is read by the DMA system but NOT transferred. When the internal count of words read is greater than or equal to the value of DSKSYNC, the data being read is duly transferred. Thus the disc controller can be programmed to wait for the syn- chronisation mark at the start of a data block. Two other registers exist. These are DSKDAT (offset $026) which is used to contain the data written to the disc by the DMA controller, and the DSKDATR register (offset $008) which contains data read from the disc. DSK- DATR is an early-read register, NOT accessible by the 68000. Hardware:Interfaces There are three interfaces to handle here. These are the parallel interface, the serial interface, and the analogue inputs to the gameports (which can be used for other uses apart from game paddles). The parallel interface:this is primarily controlled via the CIAs, & the data lines are coupled to PB7-PB0 of CIA-A (i.e., CIAAPRB, whose address is $BFE101). The PC output of CIA-A is connected to the DATA READY signal of the handshake line, and the FLAG pin to the DATA ACKNOWLEDGE signal. Since data register B of each CIA is equipped with handshaking built in, whereby an access (read or write) causes PC to go low for one clock cycle, writing to CIAAPRB automatically sends out a DATA READY signal. When the connected dev- ice responds, it pulls the line connected to FLAG low to signal DATA ACKNOW- LEDGE, and the FLAG bit in the ICR is set. Because of this, it is possible to process data output to the Centronics interface via an interrupt routine, & allow programs to continue with other processing while the interrupt routine handles the output - ideal for printer spooling, for example. The handshaking process can also be used for data INPUT from the Centronics interface (assuming a bidirectional perpipheral is connected!). PC and FLAG are handled automatically by the CIA, and handling the FLAG int- errupt is virtually all that the programmer needs to do under normal circum- stances. CIA-B is used for the SELECT and BUSY signals, bit 2 of CIABPRA (at address $BFD000) being SELECT, and bit 0 being BUSY. The BUSY signal is used for communication with slow peripherals (e.g., printers), and the interrupt routine can also wait for the BUSY signal to change before continuing output to a printer. The serial interface:the serial interface is controlled by a combin- ation of CIA registers and Paula registers. The CIA connections for the ser- ial interface are: /DTR signal : CIA-B PA7 /RTS signal : CIA-B PA6 /CD signal : CIA-B PA5 /CTS signal : CIA-B PA4 /DSR signal : CIA-B PA3 All of these signals are sent through inverter logic as part of the RS-232 driver hardware, and so the signals are active low. Setting the corresponding CIA-B bit to 0 sets the corresponding RS-232 line to high. TAKE CAREFUL NOTE OF THIS! Forgetting the inverse logic of this RS-232 interface is a common source of interfacing problems. When using RTS/CTS protocol, RTS should be made an output (set the corresponding bit in CIABDDRA) and CTS an input (clear the appropriate DDRA bit). I am not currently sure how to handle XON/XOFF, so until I have access to the appropriate data, I shall leave XON/XOFF undocumented. One feature of serial data transfer using RS-232 is that clock sig- nals are not provided. This means that both sender and receiver must provide their own timing, and that the times must match for secure data transfer. A set of standard baud rates exist for RS-232, typical values being 300 baud, 1200, 2400, 4800 and 9600 baud. Some fast peripherals (e.g., the new modems for mainframe communications using SYSTEM-X exchanges) can have a maximum baud rate of 38400 baud (but since they're £5,000 each, few readers of this DOC file will have one coupled to their Amigas...) and high baud rates are also a feature of experimental computer-moderated radio transmissions using short-wave radio. The serial interface controller, or UART (this admonitive acronym stands for 'Universal Asynchronous Receiver/Transmitter', which is a lot of fun to try and say after several Bacardis) allows setting of the baud rate in the SERPER register (custom chips, offset $032, write-only). This regis- ter also controls the data length to some extent. Bit 15 (the LONG bit), if set, makes the length of the receive data 9 bits instead of 8. The remain- ing 15 bits determine the baud rate. Baud rate determination is indirect. Again, the number used is the number of bus cycles (just as for audio data sampling rates) taken to trans- mit one byte of data. If it takes N bus cycles to transmit a byte, the number N-1 must be written to SERPER (for some perverse reason). So the relationship between baud rate and the value to write into SERPER (here designated as S) is S = (1 / (B * 2.79365 * 10E-7) ) - 1 where B is the baud rate, and 2.79365 * 10E-7 is the time taken for one bus cycle (279.365 nanoseconds). So, for 4800 baud transmit/receive rate, the value is S = (1/4800 * 2.79365*10E-7) - 1 which is 744.738. Rounding to the nearest integer, we have 745. So to select 4800 baud we use move.w #745,SERPER(a5) for 8-bit data transfers, and move.w #$8000+745,SERPER(a5) for 9-bit receives. Ok, we can now set the baud rate, and have access to the control signals. The other registers we need are the Paula serial data registers (Cross-reference #2): Register Offset Function -------- ------ -------- SERDAT 030 Contains data to send (RS-232 output) SERDATR 018 Contains data to read (RS-232 input) The SERDATR register for data reading (RS-232 input mode) has several bits allocated to various functions. The bit assignments for SERDATR are: Bit Name Function --- ---- -------- 15 OVRUN Overrun of receiver shift register if set 14 RBF Receive buffer full if set 13 TBE Transmit buffer empty if set 12 TSRE Transmit shift register empty if set 11 RXD Matches level on RXD line 10 Unused 9 STP Stop Bit Value 8 DB8 Depends on LONG in SERPER 7-0 DB7-0 Receive data buffer bits 7-0 SERDAT (offset $030) is used to contain the data to be transmitted from the Amiga. Because of the time taken for serial data transfer using normal RS-232 protocols, there is no provision for a data buffer pointer and a DMA control read/write system to automatically read in or write out a block of data of a given size, as the need was felt not to exist by the designers. The maximum possible data transfer rate corresponds to a SERPER value of zero, and equals approximately 3,580,000 baud! Not a regularly selected baud rate...few appli- cations need this kind of speed (usually confined to military systems, which also possess inbuilt data encryption, reversible frequency modulated pertur- bation of the data stream and other weird features not present on the Amiga) and it is very unlikely that readers of this DOC file will ever need it. Two interrupts are provided for handling serial transfers. The RBF (Receive Buffer Full) interrupt handled via the IPL5 interrupt vector is the interrupt used for handling RS-232 transfers from the outside world to the Amiga, and the TBE (Transmit Buffer Empty) interrupt handled via the IPL1 interrupt vector is used to handle RS-232 transfers from the Amiga to the outside world. Both interrupt vectors should be initialised appropriately and the interrupt code should principally restrict itself to either sending or receiving a byte of data. Configuring the baud rates and other allied func- tions should be left to other routines called by the main program. The procedures are as follows: * Reading a byte of data from RS-232 * This is interrupt code, so put an RTE after it * if using 'as is'. If using AmigaDos, there are * other ways of doing it - see elsewhere. * assumes ptr to custom chips in A5!! move.w INTREQR(a5),d0 ;get interrupt request bclr #11,d0 ;RBF interrupt? beq.s not_RBF ;no move.w SERDATR(a5),d0 ;get received data (clear RBF) move.w d0,ser_word ;save it move.w d0,INTREQ(a5) ;and acknowledge interrupt ;(see note below) not_RBF ... The RBF (Receive Buffer Full) bit is set in SERDATR and INTREQ/R whenever a data word is transferred from Paula's internal shift register to the SERDATR register. At this point, SERDATR should be read, to clear space for the next incoming data word. Reading the next data word clears RBF, and signals that SERDATR is ready to receive the next data word being read into Paula's int- ernal shift register. If SERDATR is not read, and the shift register has received another complete data word, OVRUN is set. This signals that no more data can be rec- eived because both SERDATR and the shift register contain inputted data. When SERDATR is read under these conditions, OVRUN is reset, and RBF also. RBF is then immediately set again and the full contents of the internal shift regis- ter are loaded into SERDATR again, allowing more data to enter the shift reg- ister. Obviously, once RBF is set again, the data must be read from SERDATR again. The format of the data to be read is determined by SERDATR and SER- PER. If the LONG bit of SERPER is set, the data is 9 bits, else 8 bits. If the data is 8 bits, then bits 9 and 10 mark the stop bits, if present, and are set if there is a stop bit there. If the data is 9 bits, bit 10 is the stop bit if present, again set if a stop bit encountered. Note that all data is sent with one start bit, which always has the value 0. This applies both to reception and transmission, and the hardware detects the end of a data word by noting the transmission from the 1 of a stop bit to the 0 of the next start bit. Note that when the data transmitter has finished sending its data to the Amiga, the RBF bit will never be set after the last word has been pro- cessed, and the interrupt routine will never be called from this point on. Usually, RS-232 systems signal this, either by sending a data byte or bytes at the start indicating the size of the data block being transmit- ted, or by sending a special 'end of transmission' character at the end of transmission. Two typical choices are CTRL-D (known as the ASCII EOT char- acter) or CTRL-Z (EOF, or end of file, on many systems). If the transmission consists of binary data instead of ASCII characters, then either the start of the transmission must contain the byte count of the block, or else another means of signalling end of transmission is required, as any of the control characters could be a valid data byte, unless an encoding scheme is used. * Writing a byte of data to RS-232 * Again,interrupt code, pointer to the * custom chips in A5. move.w INTREQR(a5),d0 ;get interrupt request bclr #0,d0 ;TBE interrupt? beq.s not_TBE ;no move.w ser_word,d0 ;get data to send move.w d0,SERDAT(a5) ;send it (clear TBE) move.w d0,INTREQ(a5) ;acknowledge interrupt not_TBE ... The data to be output in this case is written to SERDAT. It is then immedia- tely transferred to the output internal shift register. This is signalled by the TBE bit, which is set to indicate that SERDAT is able to receive more data. Once TBE is set, more data should be written to SERDAT to maintain the data flow. Should this not occur, and the output shift register is emptied before SERDAT is reloaded, then the TSRE (Transmit Shift Register Empty) bit is set. This is cleared when SERDAT is loaded, as is TBE. But TBE is immedia- tely set again as the contents of SERDAT are sent to the output shift regis- ter, and TBE is cleared again, allowing more data to be written. The format of the data to be sent is determined by SERDAT. The data formats for different cases are: 8 bit data, 1 stop bit : 00000001 dddddddd 8 bit data, 2 stop bits : 00000011 dddddddd 9 bit data, 1 stop bit : 0000001d dddddddd To stop transmission of data, one bit of the ADKCON register (offset $09E) is provided, the UARTBRK bit (bit 11). Setting this bit using the instruction move.w #$8800,ADKCON(a5) (ADKCON uses the SETIT mechanism-see DMACON) stops serial data transfer and clears TXD (the transmit data line) of the serial port. Analogue inputs:the gameports possess two analogue inputs each, and it is possible to connect game paddles to them or other analogue signal gene- rating equipment. Game paddles usually use a sliding or twist knob to change the resistance of a potentiometer (known as a 'pot' for short - hence the use of POTxxxx register names for this interface!). Analogue joysticks, with a potentiometer for the X and the Y direc- tion, can also be connected. The values that these produce are read in the POTxDAT registers (POT0DAT, offset $012, POT1DAT, offset $014). Bits 0-7 are used for the X-value, and bits 8-15 for the Y-value. Now, how does this work? Well, Paula contains a circuit to handle a simple analogue-to-digital conversion. The requirement is that the maximum resistance of the potentiometers should be 470 Kilohms (with a tolerance of plus or minus ten percent). One side of the potentiometer is connected to the 5-volt power supply, and the other to one of the analogue inputs. These lead internally to Paula and to a capacitor, one for each input, connected between the input and ground. The paddle outputs are placed briefly at ground, discharging the ca- pacitors. Also, the counters in POTxDAT are cleared. For each raster line, the counters are incremented by one while the capacitors are charged through the resistors. When the voltage across the capacitor exceeds a preset value, the corresponding counter is stopped. Thus the counter state is directly pro- portional to the input resistance. Small values equal low resistances, large values equal high resistances. The POTGO register (Cross-reference #7) determines whether the ana- logue pins are inputs or outputs. The bits are assigned as follows: Bit Name Function --- ---- -------- 15 OUTRY 1=gameport 1 POTY bit is output, 0=input 14 DATRY Gameport 1 POTY data bit 13 OUTRX 1=gameport 1 POTX bit is output, 0=input 12 DATRX Gameport 1 POTX data bit 11 OUTLY 1=gameport 0 POTY bit is output, 0=input 10 DATLY Gameport 0 POTY data bit 9 OUTLX 1=gameport 0 POTX bit is output, 0=input 8 DATLX Gameport 0 POTX data bit 7-1 unused 0 START Discharge capacitors & begin analogue measurement A write access to POTGO (offset $034) clears both POTxDAT registers. POTGOR also exists (offset $016) to allow the states to be read. Normally, START is set to 1 at the start of the vertical blank int- erval, and the valid potentiometer values can be read at the start of the next VBL, immediately prior to setting START to 1 again. If the corresponding OUTxx bit above is set to 1, the corresponding line is treated as a digital output, and the corresponding DATxx bit is sent out along it. If OUTxx=0, then DATxx in POTGOR yields the current state of those lines as digial outputs. Paddle buttons use the same bits as the joystick data registers (ho, hum!). The assignments are: Gameport 0 Gameport 1 ---------- ---------- Left Button JOY0DAT bit 9 JOY1DAT bit 9 Right Button JOY0DAT bit 1 JOY1DAT bit 1 For each of these, the bit is 1 if the button is pressed. Hardware:Mouse, Keyboard, Joysticks I assume that most of those readers requiring this file for hardware documen- tation know what a keyboard, a mouse and a joystick are. Some may even have dismantled several of these items (I have-much to the dismay of those people whose equipment I have dissected!) and found out about the inner workings of the basic hardware. However, there exist extra components within these pieces of hardware whose function needs a little extra explanation. Modern keyboards are no longer simply a matrix of switches. Most of the keyboards on modern computers have internal controller chips of their own with their own RAM and ROM, allowing customisation and configuration of the keyboard. IBM PCs (ugh!) use an Intel 8047 controller, Sinclair QLs (yeuck!) use an Intel 8049, Atari STs (duh...) use a 6301 processor chip (this was at one time the heart of expensive desktop computers!) and the Amiga? Well, the Amiga uses the MOS 6500/1 processor. So what? Well, a 6500/1 is really a 6502 processor, as found in the PET/VIC-20/Commodore 64, with on-chip RAM and ROM. The ROM is mask-programmed with the control program for the Amiga keyboard. Cross-reference #1:the CIAs are coupled via several links to the 6500/1 for keyboard communication. So, if it wasn't for the mask programmed ROM, in theory anyone with a 6502 assembler and the motivation could write his own keyboard controller program for the Amiga. Sad to say, the existence of a control program embed- ded in ROM, plus the one-way data traffic (keyboard to Amiga) makes this im- possible. Atari ST keyboard controllers CAN be reprogrammed, but I warn any- one tempted to try, it is HARD. The 6500/1 has a 2K ROM, 64 bytes of static RAM, 4 bidirectional 8- bit ports, a 16-bit counter with its own control input, and a clock generator of its own. This chip is interfaced to the Amiga via two precision 556 timer chips. These provide a reset signal for the Amiga. The mechanism by which the reset mechanism is provided is interesting - more later. The 6500/1 reads the keyboard matrix via ports C and D (the 6500/1 has four I/O ports on-chip) to obtain details about which key has been either pressed or released. This information is converted from the bitwise port in- formation, into a raw key code passed out to the Amiga via port A. This data is transmitted serially, and the requisite line from the 6500/1 is connected to the SP input line of CIA-A. The CNT line of the same CIA provides the key- board system with the clock signal for signal synchronisation. Having scanned the matrix, the 6500/1 returns raw key codes whenever there is a state change in the keyboard. If a key is pressed, the code for that key is sent. If that key is released, the code for that key, with bit 7 set, is sent to signal 'key released'. If a different key is pressed before the release of the original key, the new keypress is sent first. The keyboard events are sent in the order of occurrence where possible. Special keys are wired into a different part of the matrix, so that keycode clashes do not occur with the special keys SHIFT, CONTROL, ALT & the two AMIGA keys. Both left and right SHIFT, left and right ALT, and left and right AMIGA are given their own separate keycodes, making for a massive num- ber of keycoding possibilities. CAPS LOCK also has its own key code, and is treated somewhat differently. The 6500/1 simulates a push-button with this key, no release information being supplied. The CAPS LOCK key state is only judged to have changed when pressed - releasing the CAPS LOCK key is ignored by the 6500/1. When pressed the first time, the LED lights up, and a code is sent corresponding to 'key pressed'. The second time CAPS LOCK is pressed, at which point the LED becomes unlit, the code for 'key released' (bit 7 set) is sent. This is the only key treated thus - all other keys have their pressed/ released events handled in the normal way described above. Key code groups are : $00 - $3F : Normal ASCII letter keys in the main keyboard section between the grey ENTER and the CAPS LOCK key, plus the numeric keypad keys except ENTER. $40 - $4F : Codes of standard special keys, such as the SPACE BAR, RETURN, TAB, BACKSPACE, DEL, ESC, numeric keypad ENTER. $50 - $5F : Function keys F1-F10, HELP. $60 - $6F : SHIFT, ALT, CONTROL, AMIGA and CAPS LOCK keys. I shall supply the codes for the special keys individually. To work out the other keys, it is possible to write a piece of code to scan the keyboard and read the raw key codes. More of this later. The special key codes are : Key Code --- ---- LEFT SHIFT $60 RIGHT SHIFT $61 CAPS LOCK $62 CONTROL $63 LEFT ALT $64 RIGHT ALT $65 LEFT AMIGA $66 RIGHT AMIGA $67 There are also special codes sent for certain special functions. These are: Keycode Function ------- -------- $F9 Last key code sent was incorrect $FA Keyboard buffer of 6500/1 is full $FC Error in keyboard self test (AGH! REPAIR TIME!) $FD Start of keys held down on power up $FE End of keys held down on power up The $F9 code is sent if there has been disruption of the keyboard linkage. On the A500, this usually means time to get the Amiga mended, but on A100/A2000 machines this can result from unplugging a keyboard & then plugging in a new keyboard while the Amiga is switched on. This allows the keyboard resynchron- isation system to be activated, re-establishing secure communications. The 6500/1 has an internal character buffer of 10 characters. If the buffer becomes full (because software is not reading it quickly enough), the 6500/1 sends the $FA code to indicate a full keyboard buffer, and that sub- sequent keypresses will be lost. Keyboard data communications are always conducted from the keyboard to the Amiga. The Amiga sends handshaking signals to the keyboard and also a clock signal for data transfer synchronisation. The actual order in which the bits are sent is not the usual order of 76543210, but 65432107. So when the data byte is received, it needs to be rotated one bit position to obtain the true keycode. The CIA shift register of CIA-A contains the data once read, and sends a level-2 interrupt once the data has been received. The level-2 interrupt code should then read that data byte, output the handshake pulse and save the received code somewhere safe to be processed later by a user program. Keycodes furthermore are inverted, because the circuitry is designed as ACTIVE LOW. This means that a low voltage corresponds to a 1, and a high voltage corresponds to a 0. To obtain the keycode, the bits must be inverted back to the normal form. Basically, the 6500/1 puts the data bits on its data line (KDAT), plus a 20-microsecond low pulse on the clock line (KCLK). Between each of the data pulses, 40-microsecond pauses are placed. Hence data transfer rate is 1 bit every 60 microseconds, or 16667 bits per second (16666 baud). After the last data bit has been sent, the 6500/1 waits for a handshake pulse. This is performed by the Amiga pulling KCLK low for 75 microseconds the moment that the last data bit is received. To handle the keyboard interface, the kind of code I would use looks like this, on occasions when the operating system is being bypassed: * complete level 2 interrupt handler for CIA-A * to initialise CIA-A to activate this routine properly * initialise with CIAAICR = $10, CIAACRA = $40 in the * main program. This sets SPMOD = input, * INMODE = CNT, generate interrupt when SP full. * Don't forget to point IPL2 vector to this code as well! ciaint move.w #$2700,sr ;lock out higher ints ;just in case move.l d0,-(sp) ;keep this move.w INTREQR(a5),d0 btst #3,d0 ;CIA interrupt? beq.s ciaint_1 ;nope! move.w #$0008,INTREQ(a5) ;system acknowledge IRQ move.b CIAAICR,d0 ;check CIA-A interrupt reg btst #7,d0 ;CIA interrupt proper? beq.s ciaint_1 ;no, cock up somewhere so ignore btst #3,d0 ;SP data full? beq.s ciaint_1 ;no, cock up somewhere so ignore addq.w #1,ciacount ;cia interrupt counter move.b CIAASP,d0 ;get key code from keyboard or.b #$40,CIAACRA ;set SPMODE=output (pulls ;KCLK low!) not.b d0 ror.b #1,d0 move.b d0,rawkey ;proper raw key code moveq #8,d0 ;wait for 75 microsecs ciaint_0 subq.w #1,d0 ;while pulling KCLK low bne.s ciaint_0 and.b #$BF,CIAACRA ;SPMODE=input again ciaint_1 move.l (sp)+,d0 rte ciacount dc.w 0 rawkey dc.b 0,0 Keyboard reset mechanism:this is managed by the 6500/1. Pressing the sequence of keys CTRL-AMIGA-AMIGA causes a hard reset. The 6500/1 control program will sense this sequence, and pull KCLK low for about 0.5 seconds. This tells the reset circuit of the Amiga to generate a hard reset. After one or more of the keys are released, the 6500/1 also undergoes a reset, rebooting its control program from scratch, signalled by flashing of the CAPS LOCK LED. Since KCLK is connected to the CNT pin of CIA-A, and the above interrupt routine shows a way of pulling KCLK low for 75 microseconds, it needs little imagination to see that increasing the delay will allow software generation of a hard reset! Just set SPMODE = OUTPUT for CIA-A, and hang the processor. After 0.5 seconds or more, the Amiga will reset! Mouse handling:the mouse counters are part of Denise. There are two registers, called JOY0DAT (offset $00A) and JOY1DAT (offset $00C) for each of the gameports 0 and 1. Just to relieve the confusion, the back panel of the A500 says 'Joystick port 1' and 'Joystick port 2'. Subtract 1 from the num- bers to get the appropriate gameport counter. The high byte of each counter counts the vertical pulse count from 0 to 255, the low byte the horizontal pulse count from 0 to 255. The mouse counters count 200 pulses per inch (about 79 pulses/cm). This makes the counters overflow after the mouse has moved about 4 cm. To counter this, word-wide counters should be set up in software, and the actual gameport counters used to update these. This is normally done by the opera- ting system during the vertical blank interrupt. The following code is an extract from a vertical-blank interrupt routine that I have used to handle mouse counters: ;read mouse during VBL interrupt. mousex/y = old value of ;counters, mouseh/v = horiz/vert movement move.w JOY0DAT(a5),d0 ;get mouse counter move.w d0,newmouse ;save for later and.w #$FF,d0 ;x counter sub.w mousex,d0 ;difference old-new cmp.w #-127,d0 ;underflow? bge.s vblmouse1 ;no neg.w d0 ;else -255-diff sub.w #255,d0 bra.s vblmouseh ;store it vblmouse1 cmp.w #127,d0 ;overflow? ble.s vblmouseh ;no neg.w d0 ;else 255-diff add.w #255,d0 vblmouseh move.w d0,mouseh ;store horiz difference >0 = right move.w newmouse,d0 ;get saved mouse counter lsr.w #8,d0 ;get vertical count sub.w mousey,d0 ;difference old-new cmp.w #-127,d0 ;underflow? bge.s vblmouse2 ;no neg.w d0 ;else -255-diff sub.w #255,d0 bra.s vblmousev ;store it vblmouse2 cmp.w #127,d0 ;overflow? ble.s vblmousev ;no neg.w d0 ;else 255-diff add.w #255,d0 vblmousev move.w d0,mousev ;store vert difference >0 = down moveq #0,d0 move.w newmouse,d0 ;get mouse counters lsl.l #8,d0 ;split across 2 words lsr.w #8,d0 ;isolate x move.w d0,mousex ;save swap d0 ;get y move.w d0,mousey ;save newmouse dc.w 0 mousex dc.w 0 mousey dc.w 0 mouseh dc.w 0 mousev dc.w 0 The algorithm used is as follows:the assumption is made that the mouse coun- ters are not changed by more than 127 pulses between reads. Both old and new values are maintained, and new compared with old. The value diff = old - new is calculated. If 0 < diff < 127, the mouse movement was either right or down without overflow. If -127 < diff < 0, the mouse movement was either left or up, without overflow. If diff > 127, movement was right or down, with a coun- ter overflow. If diff < -127, movement was left or up, with a counter under- flow. For overflow, the actual mouse movement is computed as 255-diff, while for an underflow, the actual mouse movement is computed as -255-diff. To reset the mouse counters, use the JOYTEST register (offset $036). This register is unusual. The register bit allocation is as follows: Y Y Y Y Y Y y y X X X X X X x x The Y bits are the upper 6 bits of the vertical counter, and the X bits are the upper 6 bits of the horizontal counter. The yy and xx bits are connected directly to the mouse input signals, and are not located anywhere in memory. So these values cannot be changed at all in software. JOYTEST has the effect of resetting both sets of mouse counters, making JOY0DAT and JOY1DAT equal in value. Whatever value is sent to JOYTEST is sent to both of the JOYxDAT regi- sters. The mouse buttons are handled separately. If the mouse is attached to port 0, the signals occur as follows: LEFT BUTTON : CIAAPRA, Bit 6 RIGHT BUTTON : DATLY of POTGO (#7) MIDDLE BUTTON(*) : DATLX of POTGO (#7) For game port 1, the signals are: LEFT BUTTON : CIAAPRA, Bit 7 RIGHT BUTTON : DATRY of POTGO (#7) MIDDLE BUTTON (*) : DATRX of POTGO (#7) For all of these, a zero bit value means that the button is PRESSED. Joysticks are handled in a similar way-they use the same counters & JOYTEST. However, to sense the direction in which the joystick is moved, the software algorithm differs. Basically, the following table holds: Joystick Right : Bit 1 JOYxDAT = 1 Joystick Left : Bit 9 JOYxDAT = 1 Joystick Back : (Bit 1 EOR Bit 0) = 1 Joystick Forward : (Bit 9 EOR Bit 1) = 1 The following piece of code can be used to generate direction indicators for the joystick (here called DX for left/right, DY for up/down): move.w JOY0DAT(a5),d0 ;get joystick value move.w d0,d1 ;& copy it lsr.w #1,d1 ;shift copy left moveq #0,d2 ;clear DX, DY btst #1,d0 ;bit 1 set (right) ? beq.s notright ;no move.w #1,d2 notright btst #9,d0 ;bit 9 set (left) ? beq.s notleft ;no move.w #-1,d2 notleft swap d2 eor.w d0,d1 btst #0,d1 ;result 1 (back) ? beq.s notback move.w #-1,d2 notback btst #8,d1 ;result 1 (forward) ? beq.s notfront move.w #1,d2 notfront swap d2 Now D2 contains the value DX in the low word, DY in the high word, and the programmer can handle this value ad lib. This method allows diagonal joystick values to be managed (the example in the Amiga System Programmer's Guide does not allow this) in a way that is useful for such things as games. Note that to ensure diagonal values are read properly, it might be prudent to embed the code above as a subroutine, save the value in D2 in D3 after the first call, and then call the subroutine a few times more, each time ORing the result of D2 into D3. This allows transient diagonal joy- stick values to be strobed in case the joystick response is poor, a standard trick used by games writers on systems with known poor joystick responses. The joystick fire buttons for each port correspond to the left mouse buttons in each case-the states of each are read from the same bits of the same register (see mouse above). Hardware:Some Notes This file does NOT contain information about the Enhanced Chip Set (from here on known as the ECS). The ECS is able to access a CHIP RAM range of 1MB as opposed to the 512K of the standard chip set, and I assume this is achieved by expanding the various pointer high word registers to 4 bits wide, making a 20-bit address whole. Mind you, I have discovered that making logical ass- umptions such as this about Commodore hardware can lead a programmer right up shit creek without a paddle. The ECS is also supposed to possess some other functional enhance- ments. The exact details are not known to me at the time or writing. Anyone possessing ECS information are requested to supply the requisite information to the following address to allow me to maintain precise updates: Dave Edwards 232 Hale Road WIDNES Cheshire WA8 8QA The same applies to other DOC files in this series, which are also obtain- able by sending a blank disc plus SAE for return postage to Mark Meany 1 Cromwell Road Polygon SOUTHAMPTON Hants SO1 2JH marking your envelope "CLUB DISC 4" and enclosing a covering letter explain- ing your requirements. Other useful files and software are also available on the various Club Discs from Mark above, and any back copies may be obtained (assuming that archive copies still exist) by marking your envelope with the legend "Club Disc N" where N is the disc number. Don't forget to include an SAE or better still a jiffy bag for return postage, as neither he nor I are rich enough to provide a freepost service! Hardware:Logic Tutorial For the mathematically minded, I present a logic tutorial allowing derivation of minterms via the mechanism of developing alternational normal schemata to create the individual minterm components. Conventions:In this tutorial, AB is used to represent A AND B, A + B is used to represent A OR B, and b is used to represent NOT B. In an expres- sion such as ABC + ABc + AbC the AND operation takes precedence over the OR operation, and the NOT opera- tion applies only to that letter typed in lower case. The expression (NOT B) AND (NOT C) is represented by bc and this list of conventions is thus complete. Rules:The following laws apply to all logical expressions: 1) AB is the same as BA, and A + B is the same as B + A. This is the Commutative law. 2) A(BC) and (AB)C are equivalent. Similarly, A + (B + C) is equivalent to (A + B) + C. This is the Associative law. 3) The expression A(B+C) expands to AB + AC. This is the Distributive law of conjunction over alternation. 4) The logical AND of any single term with an expression of the form (B + b) has no effect - this is the identity operation (equivalent to mutiplying numbers by 1). So A is equivalent to A(B+b) and further equivalent to A(B+b)(C+c). These fundamental laws apply to many mathematical systems other than Boolean Algebra, in particular they apply to the arithmetic of integers, rational numbers and real numbers. Alternational Normal Schemata:This long-winded term describes any logical expression which contains terms of the form ABC combined using the OR operator. So the expression AB + ABc + ABCD + ABcD + Abcd is an alternational normal schema, whereas (A+BC)D + ABCD is not. The reason for the name is this:alternation is another name for the OR operator in mathematical literature, and a normal schema is one in which the format of the component terms obeys a strict set of rules. A DEVELOPED alternational normal schema is one where all of the com- ponents contain the same number of component variables. The creation of a de- veloped alternational normal schema involves scanning the expression for the term with the maximum number of variables, and expand all terms deficient in variables until all terms match in number of variables. An example: AB + C + b becomes AB(C+c) + (A+a)(B+b)C + (A+a)b(C+c) and then ABC + ABc + ABC + AbC + aBC + abC + AbC + Abc + abC + abc Collecting together identical terms into single terms, and eliminating the surplus terms, this becomes ABC + ABc + AbC + Abc + aBC + abC + abc This is a devleoped alternational normal schema. All of the terms have the same number of variables, and the expression is made up of terms of the form ABC, all combined by alternation or the logical OR operation. Now, the connection between developed alternational schemata and minterms is simple - they're one and the same. This tutorial is simply a dem- onstration of a more formal method of derivation for those with the necessary background. The example I have used to demonstrate the technique in action was chosen deliberately to illustrate this connection. Basically, all that a programmer does when picking minterms is an operation of the above kind, even if using a more intuitive and less formal method.